Locked History Actions

Diff for "NKJPNGrams"

Differences between revisions 8 and 11 (spanning 3 versions)
Revision 8 as of 2012-09-10 16:58:21
Size: 526
Editor: MichalLenart
Comment:
Revision 11 as of 2021-01-26 11:10:30
Size: 647
Comment:
Deletions are marked like this. Additions are marked like this.
Line 2: Line 2:
= N-grams from balanced National Corpus of Polish = = N-grams from the balanced subcorpus of the National Corpus of Polish =
Line 4: Line 4:
The resource is a set of N-grams extracted from balanced [[http://nkjp.pl|National Corpus of Polish]] for N from 1 to 5. Each unigram is maximum continuous chunk of non-whitespace lower-case characters. The resource contains all unique N-grams followed by number of occurrencies. The resource is a set of N-grams extracted from the balanced subcorpus of [[http://nkjp.pl|National Corpus of Polish]] (300M tokens) for N from 1 to 5. Each unigram is maximum continuous chunk of non-whitespace lower-case characters. The resource contains all unique N-grams followed by number of occurrencies.
Line 13: Line 13:

== Licence ==

NKJP ngrams are made available on CC-BY licence.

N-grams from the balanced subcorpus of the National Corpus of Polish

The resource is a set of N-grams extracted from the balanced subcorpus of National Corpus of Polish (300M tokens) for N from 1 to 5. Each unigram is maximum continuous chunk of non-whitespace lower-case characters. The resource contains all unique N-grams followed by number of occurrencies.

Downloads

Licence

NKJP ngrams are made available on CC-BY licence.