Free subcorpus
The free subcorpus of the National Corpus of Polish consists of all its texts that are free from intelectual property constraints. These texts can be distributed without limitations.
Contents (thousands of words):
Book |
Książka |
67 |
Komisja Śledcza (Rywin) |
2650 |
Komisja Śledcza (Orlen) |
1973 |
Dzienniki Ustaw 1920-2006, kodeksy, Konstytucja |
Laws, Constitution |
6970 |
Transkrypcje obrad Sejmu 1-4 kadencji |
The Sejm proceedings transcripts |
62642 |
Transkrypcje obrad Senatu 2-7 kadencji |
The Senate proceedings transcripts |
24979 |
total |
razem |
99281 |
Download: 5SCAL-free.tar
Wikipedia subcorpus
This subcorpus consists of Wikipedia articles and is available on the Wikipedia license.
- 634 000 Wikipedia articles,
- divided into 634 parts, 1000 articles each,
- 140 827 553 segments (words) in total.
Download: nkjp-wikipedia.tar.gz