Differences between revisions 4 and 39 (spanning 35 versions)
List of publications 
        
        
    
    
        
            
            
            
            
            
            
        
    
 
| Size: 2333 Comment:  | Size: 1669 Comment:  | 
| Deletions are marked like this. | Additions are marked like this. | 
| Line 3: | Line 3: | 
| This page describes the corpus of Polish coreference, which was created as a part of the [[CORE]] project. | This page offers the official [[http://creativecommons.org/licenses/by/3.0/deed.en_US|Creative Commons Attribution 3.0 Unported License]] release of the corpus of Polish coreference, which was created as a part of the [[CORE]] project. By downloading the corpus data you accept the conditions of that licence. | 
| Line 5: | Line 5: | 
| To be updated. | '''Contact person:''' [[MaciejOgrodniczuk|Maciej Ogrodniczuk]]<<BR>> '''License:''' CC BY v.3 | 
| Line 7: | Line 9: | 
| || '''Texts type'''                                                   || '''# of texts''' || '''# of segments''' || '''Percent''' || ||Dailies ||459 ||127500 ||25.5% || ||Magazines ||406 ||117500 ||23.5% || ||Fiction literature (prose, poetry, drama) ||288 ||80000 ||16% || ||Non-fiction literature ||96 ||27500 ||5.5% || ||Instructive writing and textbooks ||100 ||27500 ||5.5% || ||Spoken – conversational ||83 ||25000 ||5% || ||Internet – interactive (blogs, forums, usenet) ||63 ||17500 ||3.5% || ||Internet – non-interactive (static pages, Wikipedia) ||63 ||17500 ||3.5% || ||Miscellaneous written (legal, advertisements, user manuals, letters)||55 ||15000 ||3% || ||Spoken from the media ||44 ||12500 ||2.5% || ||Quasi-spoken (parliamentary transcripts) ||43 ||12500 ||2.5% || ||Academic writing and textbooks ||35 ||10000 ||2% || ||Unclassified written ||19 ||5000 ||1% || ||Journalistic books ||19 ||5000 ||1% || ||''Total'' ||''1773'' ||''500000'' ||''100%'' || | {{http://i.creativecommons.org/l/by/3.0/88x31.png}} == Documentation == * [[attachment:PCC_README_EN.pdf|Description of the corpus, in English]] * [[attachment:PCC_README_PL.pdf|Description of the corpus, in Polish]] == Downloads == For the time being, a preliminary version (0.92) of the corpus is available for download in 3 formats: * [[attachment:PCC-0.92-MMAX.zip|full corpus in MMAX format]] ([[attachment:example_text_mmax.zip|example text in MMAX format]]) * [[attachment:PCC-0.92-TEI.zip|full corpus in TEI format]] ([[attachment:example_text_tei.zip|example text in TEI format]]) * [[attachment:PCC-0.92-BRAT.zip|full corpus in BRAT format]] ([[attachment:example_text_brat.zip|example text in BRAT format]]) == Online version == The corpus may be browsed online at the following [[http://core.ipipan.waw.pl/pcc/browse|link]]. You may also want to see [[PolishCoreferenceTools|Polish Coreference Tools site]]. == Citing == When using Polish Coreference Corpus, please cite our book on coreference: <<BibMate(key, "ogr:etal:15:gruyter", omitYears=true)>> but you can also check [[http://core.ipipan.waw.pl/|the project page]] for earlier publications. | 
Polish Coreference Corpus
This page offers the official Creative Commons Attribution 3.0 Unported License release of the corpus of Polish coreference, which was created as a part of the CORE project. By downloading the corpus data you accept the conditions of that licence.
Contact person: Maciej Ogrodniczuk
 License: CC BY v.3 
 
 
Documentation
Downloads
For the time being, a preliminary version (0.92) of the corpus is available for download in 3 formats:
Online version
The corpus may be browsed online at the following link.
You may also want to see Polish Coreference Tools site.
Citing
When using Polish Coreference Corpus, please cite our book on coreference:
but you can also check the project page for earlier publications.

 
 
                            

