Differences between revisions 5 and 31 (spanning 26 versions)
List of publications 
        
        
    
    
        
            
            
            
            
                
            
            
        
    
 
| Size: 2380 Comment:  | Size: 1559 Comment:  | 
| Deletions are marked like this. | Additions are marked like this. | 
| Line 3: | Line 3: | 
| This page describes the corpus of Polish coreference, which was created as a part of the [[CORE]] project. | This page offers the official [[http://creativecommons.org/licenses/by/3.0/deed.en_US|Creative Commons Attribution 3.0 Unported License]] release of the corpus of Polish coreference, which was created as a part of the [[CORE]] project. By downloading the corpus data you accept the conditions of that licence. | 
| Line 5: | Line 5: | 
| Approximate corpus texts type distribution: | '''Contact person:''' [[MateuszKopec|Mateusz Kopeć]]<<BR>> '''License:''' CC BY v.3 | 
| Line 7: | Line 9: | 
| || '''Texts type'''                                                   || '''# of texts''' || '''# of segments''' || '''Percent''' || ||Dailies ||459 ||127500 ||25.5% || ||Magazines ||406 ||117500 ||23.5% || ||Fiction literature (prose, poetry, drama) ||288 ||80000 ||16% || ||Non-fiction literature ||96 ||27500 ||5.5% || ||Instructive writing and textbooks ||100 ||27500 ||5.5% || ||Spoken – conversational ||83 ||25000 ||5% || ||Internet – interactive (blogs, forums, usenet) ||63 ||17500 ||3.5% || ||Internet – non-interactive (static pages, Wikipedia) ||63 ||17500 ||3.5% || ||Miscellaneous written (legal, advertisements, user manuals, letters)||55 ||15000 ||3% || ||Spoken from the media ||44 ||12500 ||2.5% || ||Quasi-spoken (parliamentary transcripts) ||43 ||12500 ||2.5% || ||Academic writing and textbooks ||35 ||10000 ||2% || ||Unclassified written ||19 ||5000 ||1% || ||Journalistic books ||19 ||5000 ||1% || ||''Total'' ||''1773'' ||''500000'' ||''100%'' || | {{http://i.creativecommons.org/l/by/3.0/88x31.png}} | 
| Line 24: | Line 11: | 
| To be updated. | == Documentation == * [[attachment:PCC_README_EN.pdf|Description of the corpus, in English]] * [[attachment:PCC_README_PL.pdf|Description of the corpus, in Polish]] == Downloads == For the time being, a preliminary version (0.9) of the corpus is available for download in 3 formats: * [[attachment:PCC-0.9-MMAX.zip|full corpus in MMAX format]] ([[attachment:example_text_mmax.zip|example text in MMAX format]]) * [[attachment:PCC-0.9-TEI.zip|full corpus in TEI format]] ([[attachment:example_text_tei.zip|example text in TEI format]]) * [[attachment:PCC-0.9-BRAT.zip|full corpus in BRAT format]] ([[attachment:example_text_brat.zip|example text in BRAT format]]) == Online version == The corpus may be browsed online at the following [[http://glass.ipipan.waw.pl:11111/index.xhtml|link]]. You may also want to see [[PolishCoreferenceTools|Polish Coreference Tools site]]. == Citing == When using Polish Coreference Corpus, please cite the following article: <<BibMate(key, "ogro:etal:13:ltc", omitYears=true)>> | 
Polish Coreference Corpus
This page offers the official Creative Commons Attribution 3.0 Unported License release of the corpus of Polish coreference, which was created as a part of the CORE project. By downloading the corpus data you accept the conditions of that licence.
Contact person: Mateusz Kopeć
 License: CC BY v.3 
 
 
Documentation
Downloads
For the time being, a preliminary version (0.9) of the corpus is available for download in 3 formats:
Online version
The corpus may be browsed online at the following link.
You may also want to see Polish Coreference Tools site.
Citing
When using Polish Coreference Corpus, please cite the following article:

 
 
                            
