Locked History Actions

Diff for "PolishCoreferenceCorpus"

Differences between revisions 5 and 60 (spanning 55 versions)
Revision 5 as of 2013-01-08 09:33:34
Size: 2380
Editor: MateuszKopec
Comment:
Revision 60 as of 2023-04-24 13:05:58
Size: 1775
Comment:
Deletions are marked like this. Additions are marked like this.
Line 2: Line 2:
= Polish Coreference Corpus =
This page describes the corpus of Polish coreference, which was created as a part of the [[CORE]] project.
= Polish Coreference Corpus / Korpus zależności referencyjnych =
This page offers the official [[https://creativecommons.org/licenses/by-nc/4.0/deed.pl|Creative Commons Attribution-NonCommercial 4.0 International License]] release of the corpus of Polish coreference, which was created as a part of the [[CORE]] and [[COTHEC]] projects. By downloading the corpus data you accept the conditions of that licence.
Line 5: Line 5:
Approximate corpus texts type distribution: '''Contact person:'''
[[MaciejOgrodniczuk|Maciej Ogrodniczuk]]<<BR>>
Line 7: Line 8:
|| '''Texts type''' || '''# of texts''' || '''# of segments''' || '''Percent''' ||
||Dailies ||459 ||127500 ||25.5% ||
||Magazines ||406 ||117500 ||23.5% ||
||Fiction literature (prose, poetry, drama) ||288 ||80000 ||16% ||
||Non-fiction literature ||96 ||27500 ||5.5% ||
||Instructive writing and textbooks ||100 ||27500 ||5.5% ||
||Spoken – conversational ||83 ||25000 ||5% ||
||Internet – interactive (blogs, forums, usenet) ||63 ||17500 ||3.5% ||
||Internet – non-interactive (static pages, Wikipedia) ||63 ||17500 ||3.5% ||
||Miscellaneous written (legal, advertisements, user manuals, letters)||55 ||15000 ||3% ||
||Spoken from the media ||44 ||12500 ||2.5% ||
||Quasi-spoken (parliamentary transcripts) ||43 ||12500 ||2.5% ||
||Academic writing and textbooks ||35 ||10000 ||2% ||
||Unclassified written ||19 ||5000 ||1% ||
||Journalistic books ||19 ||5000 ||1% ||
||''Total'' ||''1773'' ||''500000'' ||''100%'' ||
'''License:'''
CC BY-NC 4.0
Line 24: Line 11:
To be updated. {{http://i.creativecommons.org/l/by-nc/4.0/88x31.png}}

== Documentation ==

 * [[attachment:PCC_README_EN.pdf|Description of the corpus, in English]]
 * [[attachment:PCC_README_PL.pdf|Description of the corpus, in Polish]]

== Downloads ==

The corpus is available for download in 3 formats:
 * [[attachment:PCC-1.5-MMAX.zip|full corpus in MMAX format]] ([[attachment:example_text_mmax.zip|example text in MMAX format]])
 * [[attachment:PCC-1.5-TEI.zip|full corpus in TEI format]] ([[attachment:example_text_tei.zip|example text in TEI format]])
 * [[attachment:PCC-1.5-BRAT.zip|full corpus in BRAT format]] ([[attachment:example_text_brat.zip|example text in BRAT format]])

== Online version ==

The corpus is available:
 * [[http://cothec.nlp.ipipan.waw.pl/|for browsing]]
 * [[http://pcc.nlp.ipipan.waw.pl/|for search]]

You may also want to see [[PolishCoreferenceTools|Polish Coreference Tools site]].

== Citing ==
When using Polish Coreference Corpus, please cite our books on coreference:
<<BibMate(key, "ogr:etal:15:gruyter", omitYears=true)>>
<<BibMate(key, "ogr:19:wuw", omitYears=true)>>

but you can also check [[http://core.ipipan.waw.pl/|the project page]] for earlier publications.

Polish Coreference Corpus / Korpus zależności referencyjnych

This page offers the official Creative Commons Attribution-NonCommercial 4.0 International License release of the corpus of Polish coreference, which was created as a part of the CORE and COTHEC projects. By downloading the corpus data you accept the conditions of that licence.

Contact person: Maciej Ogrodniczuk

License: CC BY-NC 4.0

http://i.creativecommons.org/l/by-nc/4.0/88x31.png

Documentation

Downloads

The corpus is available for download in 3 formats:

Online version

The corpus is available:

You may also want to see Polish Coreference Tools site.

Citing

When using Polish Coreference Corpus, please cite our books on coreference: List of publications

Maciej Ogrodniczuk, Katarzyna Głowińska, Mateusz Kopeć, Agata Savary, and Magdalena Zawisławska. Coreference in Polish: Annotation, Resolution and Evaluation. Walter De Gruyter, Berlin, München, Boston, 2015.

List of publications

Maciej Ogrodniczuk. Automatyczne wykrywanie nominalnych zależności referencyjnych w polskich tekstach współczesnych. Wydawnictwa Uniwersytetu Warszawskiego, Warsaw, 2019.

but you can also check the project page for earlier publications.