Locked History Actions

Diff for "PolishCoreferenceCorpus"

Differences between revisions 6 and 7
Revision 6 as of 2013-01-08 12:06:40
Size: 2466
Editor: MateuszKopec
Comment:
Revision 7 as of 2013-01-08 12:10:23
Size: 805
Editor: MateuszKopec
Comment:
Deletions are marked like this. Additions are marked like this.
Line 5: Line 5:
Approximate corpus texts type distribution: This page offers the official [[http://www.gnu.org/licenses/gpl.html|GNU General Public License]] release of the corpus of Polish coreference, which was created as a part of the [[CORE]] project. By downloading the corpus data you accept the conditions of that licence.
Line 7: Line 7:
|| '''Texts type''' || '''# of texts''' || '''# of segments''' || '''Percent''' ||
||Dailies ||459 ||127500 ||25.5% ||
||Magazines ||406 ||117500 ||23.5% ||
||Fiction literature (prose, poetry, drama) ||288 ||80000 ||16% ||
||Non-fiction literature ||96 ||27500 ||5.5% ||
||Instructive writing and textbooks ||100 ||27500 ||5.5% ||
||Spoken – conversational ||83 ||25000 ||5% ||
||Internet – interactive (blogs, forums, usenet) ||63 ||17500 ||3.5% ||
||Internet – non-interactive (static pages, Wikipedia) ||63 ||17500 ||3.5% ||
||Miscellaneous written (legal, advertisements, user manuals, letters)||55 ||15000 ||3% ||
||Spoken from the media ||44 ||12500 ||2.5% ||
||Quasi-spoken (parliamentary transcripts) ||43 ||12500 ||2.5% ||
||Academic writing and textbooks ||35 ||10000 ||2% ||
||Unclassified written ||19 ||5000 ||1% ||
||Journalistic books ||19 ||5000 ||1% ||
||''Total'' ||''1773'' ||''500000'' ||''100%'' ||
== Documentation ==
Line 24: Line 9:
To be updated. Description of the corpus, in English: [[attachment:readme.pdf]]

== Downloads ==

For the time being, a preliminary version of the corpus is available for download in two formats:
 * [[attachment:pcc_mmax.tgz|MMAX]]
 * [[attachment:pcc_tei.tgz|TEI]]

Polish Coreference Corpus

This page describes the corpus of Polish coreference, which was created as a part of the CORE project.

This page offers the official GNU General Public License release of the corpus of Polish coreference, which was created as a part of the CORE project. By downloading the corpus data you accept the conditions of that licence.

Documentation

Description of the corpus, in English: readme.pdf

Downloads

For the time being, a preliminary version of the corpus is available for download in two formats:

You may also want to see Polish Coreference Tools site.