Size: 181
Comment:
|
Size: 2333
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 6: | Line 6: |
|| '''Texts type''' || '''# of texts''' || '''# of segments''' || '''Percent''' || ||Dailies ||459 ||127500 ||25.5% || ||Magazines ||406 ||117500 ||23.5% || ||Fiction literature (prose, poetry, drama) ||288 ||80000 ||16% || ||Non-fiction literature ||96 ||27500 ||5.5% || ||Instructive writing and textbooks ||100 ||27500 ||5.5% || ||Spoken – conversational ||83 ||25000 ||5% || ||Internet – interactive (blogs, forums, usenet) ||63 ||17500 ||3.5% || ||Internet – non-interactive (static pages, Wikipedia) ||63 ||17500 ||3.5% || ||Miscellaneous written (legal, advertisements, user manuals, letters)||55 ||15000 ||3% || ||Spoken from the media ||44 ||12500 ||2.5% || ||Quasi-spoken (parliamentary transcripts) ||43 ||12500 ||2.5% || ||Academic writing and textbooks ||35 ||10000 ||2% || ||Unclassified written ||19 ||5000 ||1% || ||Journalistic books ||19 ||5000 ||1% || ||''Total'' ||''1773'' ||''500000'' ||''100%'' || |
Polish Coreference Corpus
This page describes the corpus of Polish coreference, which was created as a part of the CORE project.
To be updated.
Texts type |
# of texts |
# of segments |
Percent |
Dailies |
459 |
127500 |
25.5% |
Magazines |
406 |
117500 |
23.5% |
Fiction literature (prose, poetry, drama) |
288 |
80000 |
16% |
Non-fiction literature |
96 |
27500 |
5.5% |
Instructive writing and textbooks |
100 |
27500 |
5.5% |
Spoken – conversational |
83 |
25000 |
5% |
Internet – interactive (blogs, forums, usenet) |
63 |
17500 |
3.5% |
Internet – non-interactive (static pages, Wikipedia) |
63 |
17500 |
3.5% |
Miscellaneous written (legal, advertisements, user manuals, letters) |
55 |
15000 |
3% |
Spoken from the media |
44 |
12500 |
2.5% |
Quasi-spoken (parliamentary transcripts) |
43 |
12500 |
2.5% |
Academic writing and textbooks |
35 |
10000 |
2% |
Unclassified written |
19 |
5000 |
1% |
Journalistic books |
19 |
5000 |
1% |
Total |
1773 |
500000 |
100% |