Differences between revisions 1 and 11 (spanning 10 versions)
Size: 12
Comment:
|
← Revision 11 as of 2021-01-26 11:10:30 ⇥
Size: 647
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
= NGrams = | #acl +All:read Default = N-grams from the balanced subcorpus of the National Corpus of Polish = The resource is a set of N-grams extracted from the balanced subcorpus of [[http://nkjp.pl|National Corpus of Polish]] (300M tokens) for N from 1 to 5. Each unigram is maximum continuous chunk of non-whitespace lower-case characters. The resource contains all unique N-grams followed by number of occurrencies. == Downloads == * [[attachment:1grams.gz]] * [[attachment:2grams.gz]] * [[attachment:3grams.gz]] * [[attachment:4grams.gz]] * [[attachment:5grams.gz]] == Licence == NKJP ngrams are made available on CC-BY licence. |
N-grams from the balanced subcorpus of the National Corpus of Polish
The resource is a set of N-grams extracted from the balanced subcorpus of National Corpus of Polish (300M tokens) for N from 1 to 5. Each unigram is maximum continuous chunk of non-whitespace lower-case characters. The resource contains all unique N-grams followed by number of occurrencies.
Downloads
Licence
NKJP ngrams are made available on CC-BY licence.