Locked History Actions

Diff for "PolishCoreferenceCorpus"

Differences between revisions 5 and 59 (spanning 54 versions)
Revision 5 as of 2013-01-08 09:33:34
Size: 2380
Editor: MateuszKopec
Comment:
Revision 59 as of 2022-08-12 20:08:26
Size: 1751
Comment:
Deletions are marked like this. Additions are marked like this.
Line 2: Line 2:
= Polish Coreference Corpus =
This page describes the corpus of Polish coreference, which was created as a part of the [[CORE]] project.
= Polish Coreference Corpus / Korpus zależności referencyjnych =
This page offers the official [[http://creativecommons.org/licenses/by/3.0/deed.en_US|Creative Commons Attribution 3.0 Unported License]] release of the corpus of Polish coreference, which was created as a part of the [[CORE]] and [[COTHEC]] projects. By downloading the corpus data you accept the conditions of that licence.
Line 5: Line 5:
Approximate corpus texts type distribution: '''Contact person:'''
[[MaciejOgrodniczuk|Maciej Ogrodniczuk]]<<BR>>
'''License:''' CC-BY-NC v.4
Line 7: Line 9:
|| '''Texts type''' || '''# of texts''' || '''# of segments''' || '''Percent''' ||
||Dailies ||459 ||127500 ||25.5% ||
||Magazines ||406 ||117500 ||23.5% ||
||Fiction literature (prose, poetry, drama) ||288 ||80000 ||16% ||
||Non-fiction literature ||96 ||27500 ||5.5% ||
||Instructive writing and textbooks ||100 ||27500 ||5.5% ||
||Spoken – conversational ||83 ||25000 ||5% ||
||Internet – interactive (blogs, forums, usenet) ||63 ||17500 ||3.5% ||
||Internet – non-interactive (static pages, Wikipedia) ||63 ||17500 ||3.5% ||
||Miscellaneous written (legal, advertisements, user manuals, letters)||55 ||15000 ||3% ||
||Spoken from the media ||44 ||12500 ||2.5% ||
||Quasi-spoken (parliamentary transcripts) ||43 ||12500 ||2.5% ||
||Academic writing and textbooks ||35 ||10000 ||2% ||
||Unclassified written ||19 ||5000 ||1% ||
||Journalistic books ||19 ||5000 ||1% ||
||''Total'' ||''1773'' ||''500000'' ||''100%'' ||
{{http://i.creativecommons.org/l/by-nc/4.0/88x31.png}}
Line 24: Line 11:
To be updated. == Documentation ==

 * [[attachment:PCC_README_EN.pdf|Description of the corpus, in English]]
 * [[attachment:PCC_README_PL.pdf|Description of the corpus, in Polish]]

== Downloads ==

The corpus is available for download in 3 formats:
 * [[attachment:PCC-1.5-MMAX.zip|full corpus in MMAX format]] ([[attachment:example_text_mmax.zip|example text in MMAX format]])
 * [[attachment:PCC-1.5-TEI.zip|full corpus in TEI format]] ([[attachment:example_text_tei.zip|example text in TEI format]])
 * [[attachment:PCC-1.5-BRAT.zip|full corpus in BRAT format]] ([[attachment:example_text_brat.zip|example text in BRAT format]])

== Online version ==

The corpus is available:
 * [[http://cothec.nlp.ipipan.waw.pl/|for browsing]]
 * [[http://pcc.nlp.ipipan.waw.pl/|for search]]

You may also want to see [[PolishCoreferenceTools|Polish Coreference Tools site]].

== Citing ==
When using Polish Coreference Corpus, please cite our books on coreference:
<<BibMate(key, "ogr:etal:15:gruyter", omitYears=true)>>
<<BibMate(key, "ogr:19:wuw", omitYears=true)>>

but you can also check [[http://core.ipipan.waw.pl/|the project page]] for earlier publications.

Polish Coreference Corpus / Korpus zależności referencyjnych

This page offers the official Creative Commons Attribution 3.0 Unported License release of the corpus of Polish coreference, which was created as a part of the CORE and COTHEC projects. By downloading the corpus data you accept the conditions of that licence.

Contact person: Maciej Ogrodniczuk
License: CC-BY-NC v.4

http://i.creativecommons.org/l/by-nc/4.0/88x31.png

Documentation

Downloads

The corpus is available for download in 3 formats:

Online version

The corpus is available:

You may also want to see Polish Coreference Tools site.

Citing

When using Polish Coreference Corpus, please cite our books on coreference: List of publications

Maciej Ogrodniczuk, Katarzyna Głowińska, Mateusz Kopeć, Agata Savary, and Magdalena Zawisławska. Coreference in Polish: Annotation, Resolution and Evaluation. Walter De Gruyter, Berlin, München, Boston, 2015.

List of publications

Maciej Ogrodniczuk. Automatyczne wykrywanie nominalnych zależności referencyjnych w polskich tekstach współczesnych. Wydawnictwa Uniwersytetu Warszawskiego, Warsaw, 2019.

but you can also check the project page for earlier publications.