Size: 311
Comment:
|
Size: 1208
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
= Redistributable subcorpus of the National Corpus of Polish = | #acl +All:read Default = Redistributable subcorpora of the National Corpus of Polish = |
Line 3: | Line 4: |
The distributable subcorpus of the National Corpus of Polish consists of all its texts that are free from intelectual property constraints. Those text can be distributed without limitations. |
== Free subcorpus == The distributable subcorpus of the [[http://nkjp.pl/index.php?page=0&lang=1|National Corpus of Polish]] consists of all its texts that are free from intelectual property constraints. These texts can be distributed without limitations. |
Line 6: | Line 8: |
The download link will be published here soon. | Contents (thousands of words): ||Book || Książka || 67|| ||Komisja Śledcza (Rywin) || || 2650|| ||Komisja Śledcza (Orlen) || || 1973|| ||Dzienniki Ustaw 1920-2006, kodeksy, Konstytucja || Laws, Constitution || 6970|| ||Transkrypcje obrad Sejmu 1-4 kadencji || The Sejm proceedings transcripts||62642|| ||Transkrypcje obrad Senatu 2-7 kadencji || The Senate proceedings transcripts || 24979|| || '''total''' || '''razem''' || '''99281'''|| Download: [[attachment:5SCAL-free.tar]] == Wikipedia subcorpus == This subcorpus consists of Wikipedia articles and is available on the [[http://en.wikipedia.org/wiki/Wikipedia:Copyrights|Wikipedia license]]. Contents: * 634 000 Wikipedia articles, * divided into 634 parts, 1000 articles each, * 140 827 553 segments (words) in total. Download: [[attachment:nkjp-wikipedia.tar.gz]] |
Redistributable subcorpora of the National Corpus of Polish
Free subcorpus
The distributable subcorpus of the National Corpus of Polish consists of all its texts that are free from intelectual property constraints. These texts can be distributed without limitations.
Contents (thousands of words):
Book |
Książka |
67 |
Komisja Śledcza (Rywin) |
|
2650 |
Komisja Śledcza (Orlen) |
|
1973 |
Dzienniki Ustaw 1920-2006, kodeksy, Konstytucja |
Laws, Constitution |
6970 |
Transkrypcje obrad Sejmu 1-4 kadencji |
The Sejm proceedings transcripts |
62642 |
Transkrypcje obrad Senatu 2-7 kadencji |
The Senate proceedings transcripts |
24979 |
total |
razem |
99281 |
Download: 5SCAL-free.tar
Wikipedia subcorpus
This subcorpus consists of Wikipedia articles and is available on the Wikipedia license.
Contents:
- 634 000 Wikipedia articles,
- divided into 634 parts, 1000 articles each,
- 140 827 553 segments (words) in total.
Download: nkjp-wikipedia.tar.gz