Locked History Actions

Diff for "PolishDiscourseCorpus"

Differences between revisions 1 and 7 (spanning 6 versions)
Revision 1 as of 2020-12-18 16:27:31
Size: 1705
Comment:
Revision 7 as of 2022-02-01 10:47:42
Size: 997
Comment:
Deletions are marked like this. Additions are marked like this.
Line 4: Line 4:
This page offers the official [[http://creativecommons.org/licenses/by/3.0/deed.en_US|Creative Commons Attribution 3.0 Unported License]] release of the corpus of discourse relations created as a part of the [[http://clip.ipipan.waw.pl/CLARIN-PL-2|CLARIN-PL]] project. By downloading the corpus data you accept the conditions of that licence. The following corpus of discourse relations is based on the [[PCC|Polish Coreference Corpus]] as part of the [[http://clip.ipipan.waw.pl/CLARIN-PL-2|CLARIN-PL]] project. The annotation of the corpus was completed using [[Discann|Discann annotation tool]].
Line 6: Line 6:
'''Contact person:'''
[[MaciejOgrodniczuk|Maciej Ogrodniczuk]]<<BR>>
'''License:''' CC BY v.3
== Documentation ==

Please see the [[attachment:instrukcja-anotacji-metatekstu.pdf|annotation instructions]], in Polish (by Celina Heliasz).

== Licence ==

[[http://creativecommons.org/licenses/by/3.0/deed.en_US|Creative Commons Attribution 3.0 Unported License]]
Line 12: Line 16:
== Documentation ==

 * [[attachment:PCC_README_EN.pdf|Description of the corpus, in English]]
 * [[attachment:PCC_README_PL.pdf|Description of the corpus, in Polish]]
Line 19: Line 18:
The corpus is available for download in 3 formats:
 * [[attachment:PCC-1.5-MMAX.zip|full corpus in MMAX format]] ([[attachment:example_text_mmax.zip|example text in MMAX format]])
 * [[attachment:PCC-1.5-TEI.zip|full corpus in TEI format]] ([[attachment:example_text_tei.zip|example text in TEI format]])
 * [[attachment:PCC-1.5-BRAT.zip|full corpus in BRAT format]] ([[attachment:example_text_brat.zip|example text in BRAT format]])
The corpus is available for download in the form of a [[attachment:corpus.tar.gz|zip file]] containing:
 * 1773 source XML TEI files of the Polish Coreference Corpus
 * metatext.xml file containing descriptions of all relations
Line 24: Line 22:
== Online version ==

The corpus is available:
 * [[http://cothec.nlp.ipipan.waw.pl/|for browsing]]
 * [[http://pcc.nlp.ipipan.waw.pl/|for search]]

You may also want to see [[PolishCoreferenceTools|Polish Coreference Tools site]].

== Citing ==
When using Polish Coreference Corpus, please cite our book on coreference:
<<BibMate(key, "ogr:etal:15:gruyter", omitYears=true)>>

but you can also check [[http://core.ipipan.waw.pl/|the project page]] for earlier publications.
== Publication ==
<<BibMate(key, "hel:ogr:19:lc", omitYears=true)>>

Polish Discourse Corpus / Polski Korpus Metatekstowy

The following corpus of discourse relations is based on the Polish Coreference Corpus as part of the CLARIN-PL project. The annotation of the corpus was completed using Discann annotation tool.

Documentation

Please see the annotation instructions, in Polish (by Celina Heliasz).

Licence

Creative Commons Attribution 3.0 Unported License

http://i.creativecommons.org/l/by/3.0/88x31.png

Downloads

The corpus is available for download in the form of a zip file containing:

  • 1773 source XML TEI files of the Polish Coreference Corpus
  • metatext.xml file containing descriptions of all relations

Publication

List of publications

Celina Heliasz and Maciej Ogrodniczuk. Eksplicytność a implicytność w świetle analizy korpusowej (meta)tekstu. Linguistica Copernicana, 16:75–100, 2019.