Differences between revisions 6 and 59 (spanning 53 versions)
List of publications
List of publications
Size: 2466
Comment:
|
Size: 1751
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 2: | Line 2: |
= Polish Coreference Corpus = This page describes the corpus of Polish coreference, which was created as a part of the [[CORE]] project. |
= Polish Coreference Corpus / Korpus zależności referencyjnych = This page offers the official [[http://creativecommons.org/licenses/by/3.0/deed.en_US|Creative Commons Attribution 3.0 Unported License]] release of the corpus of Polish coreference, which was created as a part of the [[CORE]] and [[COTHEC]] projects. By downloading the corpus data you accept the conditions of that licence. |
Line 5: | Line 5: |
Approximate corpus texts type distribution: | '''Contact person:''' [[MaciejOgrodniczuk|Maciej Ogrodniczuk]]<<BR>> '''License:''' CC-BY-NC v.4 |
Line 7: | Line 9: |
|| '''Texts type''' || '''# of texts''' || '''# of segments''' || '''Percent''' || ||Dailies ||459 ||127500 ||25.5% || ||Magazines ||406 ||117500 ||23.5% || ||Fiction literature (prose, poetry, drama) ||288 ||80000 ||16% || ||Non-fiction literature ||96 ||27500 ||5.5% || ||Instructive writing and textbooks ||100 ||27500 ||5.5% || ||Spoken – conversational ||83 ||25000 ||5% || ||Internet – interactive (blogs, forums, usenet) ||63 ||17500 ||3.5% || ||Internet – non-interactive (static pages, Wikipedia) ||63 ||17500 ||3.5% || ||Miscellaneous written (legal, advertisements, user manuals, letters)||55 ||15000 ||3% || ||Spoken from the media ||44 ||12500 ||2.5% || ||Quasi-spoken (parliamentary transcripts) ||43 ||12500 ||2.5% || ||Academic writing and textbooks ||35 ||10000 ||2% || ||Unclassified written ||19 ||5000 ||1% || ||Journalistic books ||19 ||5000 ||1% || ||''Total'' ||''1773'' ||''500000'' ||''100%'' || |
{{http://i.creativecommons.org/l/by-nc/4.0/88x31.png}} |
Line 24: | Line 11: |
To be updated. | == Documentation == * [[attachment:PCC_README_EN.pdf|Description of the corpus, in English]] * [[attachment:PCC_README_PL.pdf|Description of the corpus, in Polish]] == Downloads == The corpus is available for download in 3 formats: * [[attachment:PCC-1.5-MMAX.zip|full corpus in MMAX format]] ([[attachment:example_text_mmax.zip|example text in MMAX format]]) * [[attachment:PCC-1.5-TEI.zip|full corpus in TEI format]] ([[attachment:example_text_tei.zip|example text in TEI format]]) * [[attachment:PCC-1.5-BRAT.zip|full corpus in BRAT format]] ([[attachment:example_text_brat.zip|example text in BRAT format]]) == Online version == The corpus is available: * [[http://cothec.nlp.ipipan.waw.pl/|for browsing]] * [[http://pcc.nlp.ipipan.waw.pl/|for search]] |
Line 27: | Line 30: |
== Citing == When using Polish Coreference Corpus, please cite our books on coreference: <<BibMate(key, "ogr:etal:15:gruyter", omitYears=true)>> <<BibMate(key, "ogr:19:wuw", omitYears=true)>> but you can also check [[http://core.ipipan.waw.pl/|the project page]] for earlier publications. |
Polish Coreference Corpus / Korpus zależności referencyjnych
This page offers the official Creative Commons Attribution 3.0 Unported License release of the corpus of Polish coreference, which was created as a part of the CORE and COTHEC projects. By downloading the corpus data you accept the conditions of that licence.
Contact person: Maciej Ogrodniczuk
License: CC-BY-NC v.4
Documentation
Downloads
The corpus is available for download in 3 formats:
Online version
The corpus is available:
You may also want to see Polish Coreference Tools site.
Citing
When using Polish Coreference Corpus, please cite our books on coreference:
but you can also check the project page for earlier publications.