|
Size: 1461
Comment:
|
← Revision 22 as of 2026-02-28 01:31:30 ⇥
Size: 2355
Comment:
|
| Deletions are marked like this. | Additions are marked like this. |
| Line 1: | Line 1: |
| ## page was renamed from PolishDiscourseCorpus | |
| Line 4: | Line 5: |
| This page offers the official release of the corpus of discourse relations created as a part of the [[http://clip.ipipan.waw.pl/CLARIN-PL-2|CLARIN-PL]] project. By downloading the corpus data you accept the conditions of that licence. | The corpus of discourse relations is based on the [[PCC|Polish Coreference Corpus]]. The annotation of the corpus was completed using [[Discann|Discann annotation tool]]. |
| Line 6: | Line 7: |
| == Documentation == | == Version 0.1 == |
| Line 8: | Line 9: |
| * [[attachment:PCC_README_EN.pdf|Description of the corpus, in English]] * [[attachment:PCC_README_PL.pdf|Description of the corpus, in Polish]] |
=== Documentation === The [[attachment:instrukcja-anotacji-metatekstu.pdf|annotation instructions]] (in Polish) were created by Celina Heliasz. === Download === The corpus is available for download in the form of a [[attachment:corpus.tar.gz|zip file]] containing: * 1773 source XML TEI files of the Polish Coreference Corpus * metatext.xml file containing descriptions of all relations === Funding === Version 1.0 of the corpus was financed by the Polish Ministry of Education and Science under the agreement DIR/WK/2016/02. == Version 1.0 == === Documentation === The [[attachment:anotacja-pdc.pdf|annotation instructions]] (in Polish) were created by Maciej Ogrodniczuk. === Download === The corpus is available for download in the form of a [[attachment:pdc.zip|zip file]] in the [[https://clarin.biz/tools/inforex|Inforex]] format. === Funding === Version 1.0 of the corpus was financed by the European Regional Development Fund as a part of the 2014–2020 Smart Growth Operational Programme, CLARIN — Common Language Resources and Technology Infrastructure, project no. POIR.04.02.00–00C002/19, the Polish Ministry of Education and Science grant 2022/WK/09, continued as part of the investment: CLARIN ERIC – European Research Infrastructure Consortium: Common Language Resources and Technology Infrastructure (period: 2024-2026) funded by the Polish Ministry of Science and Higher Education (Programme: ”Support for the participation of Polish scientific teams in international research infrastructure projects”), agreement number 2024/WK/01 and by CLARIN-PL, the European Regional Development Fund, FENG programme, agreement number FENG.02.04-IP.040004/24. |
| Line 18: | Line 44: |
| == Downloads == The corpus is available for download in 3 formats: * [[attachment:PCC-1.5-MMAX.zip|full corpus in MMAX format]] ([[attachment:example_text_mmax.zip|example text in MMAX format]]) * [[attachment:PCC-1.5-TEI.zip|full corpus in TEI format]] ([[attachment:example_text_tei.zip|example text in TEI format]]) * [[attachment:PCC-1.5-BRAT.zip|full corpus in BRAT format]] ([[attachment:example_text_brat.zip|example text in BRAT format]]) == Online version == The corpus is available: * [[http://cothec.nlp.ipipan.waw.pl/|for browsing]] * [[http://pcc.nlp.ipipan.waw.pl/|for search]] You may also want to see [[PolishCoreferenceTools|Polish Coreference Tools site]]. == Citing == Please cite: <<BibMate(key, "hel:ogr:19:lc", omitYears=true)>> |
== Please cite == <<BibMate(key, "ogr:etal:24", "tom:etal:24:iso", "zur:etal:23:ldk", "hel:ogr:19:lc", omitYears=true)>> |
Polish Discourse Corpus / Polski Korpus Metatekstowy
The corpus of discourse relations is based on the Polish Coreference Corpus. The annotation of the corpus was completed using Discann annotation tool.
Version 0.1
Documentation
The annotation instructions (in Polish) were created by Celina Heliasz.
Download
The corpus is available for download in the form of a zip file containing:
- 1773 source XML TEI files of the Polish Coreference Corpus
- metatext.xml file containing descriptions of all relations
Funding
Version 1.0 of the corpus was financed by the Polish Ministry of Education and Science under the agreement DIR/WK/2016/02.
Version 1.0
Documentation
The annotation instructions (in Polish) were created by Maciej Ogrodniczuk.
Download
The corpus is available for download in the form of a zip file in the Inforex format.
Funding
Version 1.0 of the corpus was financed by the European Regional Development Fund as a part of the 2014–2020 Smart Growth Operational Programme, CLARIN — Common Language Resources and Technology Infrastructure, project no. POIR.04.02.00–00C002/19, the Polish Ministry of Education and Science grant 2022/WK/09, continued as part of the investment: CLARIN ERIC – European Research Infrastructure Consortium: Common Language Resources and Technology Infrastructure (period: 2024-2026) funded by the Polish Ministry of Science and Higher Education (Programme: ”Support for the participation of Polish scientific teams in international research infrastructure projects”), agreement number 2024/WK/01 and by CLARIN-PL, the European Regional Development Fund, FENG programme, agreement number FENG.02.04-IP.040004/24.
Licence
Creative Commons Attribution 3.0 Unported License
Please cite


