Locked History Actions

Diff for "PolishDiscourseCorpus"

Differences between revisions 2 and 4 (spanning 2 versions)
Revision 2 as of 2020-12-18 16:34:33
Size: 1497
Comment:
Revision 4 as of 2020-12-30 15:19:35
Size: 916
Comment:
Deletions are marked like this. Additions are marked like this.
Line 4: Line 4:
This page offers the official release of the corpus of discourse relations created as a part of the [[http://clip.ipipan.waw.pl/CLARIN-PL-2|CLARIN-PL]] project. By downloading the corpus data you accept the conditions of that licence. The Polish Discourse Corpus is a corpus of discourse relations based on the [[PCC|Polish Coreference Corpus]] as part of the [[http://clip.ipipan.waw.pl/CLARIN-PL-2|CLARIN-PL]] project.
Line 8: Line 8:
 * [[attachment:PCC_README_EN.pdf|Description of the corpus, in English]]
 * [[attachment:PCC_README_PL.pdf|Description of the corpus, in Polish]]
Please see the [[attachment:instrukcja-anotacji-metatekstu.pdf|annotation instructions]], in Polish.
Line 17: Line 16:
Line 20: Line 18:
The corpus is available for download in 3 formats:
 * [[attachment:PCC-1.5-MMAX.zip|full corpus in MMAX format]] ([[attachment:example_text_mmax.zip|example text in MMAX format]])
 * [[attachment:PCC-1.5-TEI.zip|full corpus in TEI format]] ([[attachment:example_text_tei.zip|example text in TEI format]])
 * [[attachment:PCC-1.5-BRAT.zip|full corpus in BRAT format]] ([[attachment:example_text_brat.zip|example text in BRAT format]])

== Online version ==

The corpus is available:
 * [[http://cothec.nlp.ipipan.waw.pl/|for browsing]]
 * [[http://pcc.nlp.ipipan.waw.pl/|for search]]

You may also want to see [[PolishCoreferenceTools|Polish Coreference Tools site]].
The corpus is available for download in the form of a [[attachment:corpus.tar.gz|zip file]] containing:
 * 1773 source XML TEI files of the Polish Coreference Corpus
 * metatext.xml file containing descriptions of all relations
Line 34: Line 23:
When using Polish Discourse Corpus, please cite: Please cite:

Polish Discourse Corpus / Polski Korpus Metatekstowy

The Polish Discourse Corpus is a corpus of discourse relations based on the Polish Coreference Corpus as part of the CLARIN-PL project.

Documentation

Please see the annotation instructions, in Polish.

Licence

Creative Commons Attribution 3.0 Unported License

http://i.creativecommons.org/l/by/3.0/88x31.png

Downloads

The corpus is available for download in the form of a zip file containing:

  • 1773 source XML TEI files of the Polish Coreference Corpus
  • metatext.xml file containing descriptions of all relations

Citing

Please cite: List of publications

Celina Heliasz and Maciej Ogrodniczuk. Eksplicytność a implicytność w świetle analizy korpusowej (meta)tekstu. Linguistica Copernicana, 16:75–100, 2019.