Locked History Actions

Diff for "DistrNKJP"

Differences between revisions 3 and 15 (spanning 12 versions)
Revision 3 as of 2012-07-07 22:54:40
Size: 311
Comment:
Revision 15 as of 2014-04-04 15:46:23
Size: 1208
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= Redistributable subcorpus of the National Corpus of Polish = #acl +All:read Default
= Redistributable subcorpora of the National Corpus of Polish =
Line 3: Line 4:
The distributable subcorpus of the National Corpus of Polish consists of all its texts that are free from intelectual property constraints. Those
text can be distributed without limitations.
== Free subcorpus ==
The distributable subcorpus of the [[http://nkjp.pl/index.php?page=0&lang=1|National Corpus of Polish]] consists of all its texts that are free from intelectual property constraints. These
texts can be distributed without limitations.
Line 6: Line 8:
The download link will be published here soon. Contents (thousands of words):
||Book || Książka || 67||
||Komisja Śledcza (Rywin) || || 2650||
||Komisja Śledcza (Orlen) || || 1973||
||Dzienniki Ustaw 1920-2006, kodeksy, Konstytucja || Laws, Constitution || 6970||
||Transkrypcje obrad Sejmu 1-4 kadencji || The Sejm proceedings transcripts||62642||
||Transkrypcje obrad Senatu 2-7 kadencji || The Senate proceedings transcripts || 24979||
|| '''total''' || '''razem''' || '''99281'''||

Download: [[attachment:5SCAL-free.tar]]

== Wikipedia subcorpus ==
This subcorpus consists of Wikipedia articles and is available on the [[http://en.wikipedia.org/wiki/Wikipedia:Copyrights|Wikipedia license]].

Contents:
 * 634 000 Wikipedia articles,
 * divided into 634 parts, 1000 articles each,
 * 140 827 553 segments (words) in total.

Download: [[attachment:nkjp-wikipedia.tar.gz]]

Redistributable subcorpora of the National Corpus of Polish

Free subcorpus

The distributable subcorpus of the National Corpus of Polish consists of all its texts that are free from intelectual property constraints. These texts can be distributed without limitations.

Contents (thousands of words):

Book

Książka

67

Komisja Śledcza (Rywin)

2650

Komisja Śledcza (Orlen)

1973

Dzienniki Ustaw 1920-2006, kodeksy, Konstytucja

Laws, Constitution

6970

Transkrypcje obrad Sejmu 1-4 kadencji

The Sejm proceedings transcripts

62642

Transkrypcje obrad Senatu 2-7 kadencji

The Senate proceedings transcripts

24979

total

razem

99281

Download: 5SCAL-free.tar

Wikipedia subcorpus

This subcorpus consists of Wikipedia articles and is available on the Wikipedia license.

Contents:

  • 634 000 Wikipedia articles,
  • divided into 634 parts, 1000 articles each,
  • 140 827 553 segments (words) in total.

Download: nkjp-wikipedia.tar.gz