Differences between revisions 6 and 7
Size: 2466
Comment:
|
Size: 805
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 5: | Line 5: |
Approximate corpus texts type distribution: | This page offers the official [[http://www.gnu.org/licenses/gpl.html|GNU General Public License]] release of the corpus of Polish coreference, which was created as a part of the [[CORE]] project. By downloading the corpus data you accept the conditions of that licence. |
Line 7: | Line 7: |
|| '''Texts type''' || '''# of texts''' || '''# of segments''' || '''Percent''' || ||Dailies ||459 ||127500 ||25.5% || ||Magazines ||406 ||117500 ||23.5% || ||Fiction literature (prose, poetry, drama) ||288 ||80000 ||16% || ||Non-fiction literature ||96 ||27500 ||5.5% || ||Instructive writing and textbooks ||100 ||27500 ||5.5% || ||Spoken – conversational ||83 ||25000 ||5% || ||Internet – interactive (blogs, forums, usenet) ||63 ||17500 ||3.5% || ||Internet – non-interactive (static pages, Wikipedia) ||63 ||17500 ||3.5% || ||Miscellaneous written (legal, advertisements, user manuals, letters)||55 ||15000 ||3% || ||Spoken from the media ||44 ||12500 ||2.5% || ||Quasi-spoken (parliamentary transcripts) ||43 ||12500 ||2.5% || ||Academic writing and textbooks ||35 ||10000 ||2% || ||Unclassified written ||19 ||5000 ||1% || ||Journalistic books ||19 ||5000 ||1% || ||''Total'' ||''1773'' ||''500000'' ||''100%'' || |
== Documentation == |
Line 24: | Line 9: |
To be updated. | Description of the corpus, in English: [[attachment:readme.pdf]] == Downloads == For the time being, a preliminary version of the corpus is available for download in two formats: * [[attachment:pcc_mmax.tgz|MMAX]] * [[attachment:pcc_tei.tgz|TEI]] |
Polish Coreference Corpus
This page describes the corpus of Polish coreference, which was created as a part of the CORE project.
This page offers the official GNU General Public License release of the corpus of Polish coreference, which was created as a part of the CORE project. By downloading the corpus data you accept the conditions of that licence.
Documentation
Description of the corpus, in English: readme.pdf
Downloads
For the time being, a preliminary version of the corpus is available for download in two formats:
You may also want to see Polish Coreference Tools site.