Differences between revisions 4 and 60 (spanning 56 versions)
List of publications
List of publications
Size: 2333
Comment:
|
← Revision 60 as of 2023-04-24 13:05:58 ⇥
Size: 1775
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 2: | Line 2: |
= Polish Coreference Corpus = This page describes the corpus of Polish coreference, which was created as a part of the [[CORE]] project. |
= Polish Coreference Corpus / Korpus zależności referencyjnych = This page offers the official [[https://creativecommons.org/licenses/by-nc/4.0/deed.pl|Creative Commons Attribution-NonCommercial 4.0 International License]] release of the corpus of Polish coreference, which was created as a part of the [[CORE]] and [[COTHEC]] projects. By downloading the corpus data you accept the conditions of that licence. |
Line 5: | Line 5: |
To be updated. | '''Contact person:''' [[MaciejOgrodniczuk|Maciej Ogrodniczuk]]<<BR>> |
Line 7: | Line 8: |
|| '''Texts type''' || '''# of texts''' || '''# of segments''' || '''Percent''' || ||Dailies ||459 ||127500 ||25.5% || ||Magazines ||406 ||117500 ||23.5% || ||Fiction literature (prose, poetry, drama) ||288 ||80000 ||16% || ||Non-fiction literature ||96 ||27500 ||5.5% || ||Instructive writing and textbooks ||100 ||27500 ||5.5% || ||Spoken – conversational ||83 ||25000 ||5% || ||Internet – interactive (blogs, forums, usenet) ||63 ||17500 ||3.5% || ||Internet – non-interactive (static pages, Wikipedia) ||63 ||17500 ||3.5% || ||Miscellaneous written (legal, advertisements, user manuals, letters)||55 ||15000 ||3% || ||Spoken from the media ||44 ||12500 ||2.5% || ||Quasi-spoken (parliamentary transcripts) ||43 ||12500 ||2.5% || ||Academic writing and textbooks ||35 ||10000 ||2% || ||Unclassified written ||19 ||5000 ||1% || ||Journalistic books ||19 ||5000 ||1% || ||''Total'' ||''1773'' ||''500000'' ||''100%'' || |
'''License:''' CC BY-NC 4.0 {{http://i.creativecommons.org/l/by-nc/4.0/88x31.png}} == Documentation == * [[attachment:PCC_README_EN.pdf|Description of the corpus, in English]] * [[attachment:PCC_README_PL.pdf|Description of the corpus, in Polish]] == Downloads == The corpus is available for download in 3 formats: * [[attachment:PCC-1.5-MMAX.zip|full corpus in MMAX format]] ([[attachment:example_text_mmax.zip|example text in MMAX format]]) * [[attachment:PCC-1.5-TEI.zip|full corpus in TEI format]] ([[attachment:example_text_tei.zip|example text in TEI format]]) * [[attachment:PCC-1.5-BRAT.zip|full corpus in BRAT format]] ([[attachment:example_text_brat.zip|example text in BRAT format]]) == Online version == The corpus is available: * [[http://cothec.nlp.ipipan.waw.pl/|for browsing]] * [[http://pcc.nlp.ipipan.waw.pl/|for search]] You may also want to see [[PolishCoreferenceTools|Polish Coreference Tools site]]. == Citing == When using Polish Coreference Corpus, please cite our books on coreference: <<BibMate(key, "ogr:etal:15:gruyter", omitYears=true)>> <<BibMate(key, "ogr:19:wuw", omitYears=true)>> but you can also check [[http://core.ipipan.waw.pl/|the project page]] for earlier publications. |
Polish Coreference Corpus / Korpus zależności referencyjnych
This page offers the official Creative Commons Attribution-NonCommercial 4.0 International License release of the corpus of Polish coreference, which was created as a part of the CORE and COTHEC projects. By downloading the corpus data you accept the conditions of that licence.
Contact person: Maciej Ogrodniczuk
License: CC BY-NC 4.0
Documentation
Downloads
The corpus is available for download in 3 formats:
Online version
The corpus is available:
You may also want to see Polish Coreference Tools site.
Citing
When using Polish Coreference Corpus, please cite our books on coreference:
but you can also check the project page for earlier publications.