Locked History Actions

Diff for "NKJPNGrams"

Differences between revisions 7 and 9 (spanning 2 versions)
Revision 7 as of 2012-08-01 11:13:07
Size: 502
Editor: MichalLenart
Comment:
Revision 9 as of 2013-01-25 15:20:35
Size: 540
Editor: MichalLenart
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
#acl +All:read Default
Line 3: Line 4:
The resource is a set of N-grams extracted from balanced [[http://nkjp.pl|National Corpus of Polish]] for N from 1 to 5. Each unigram is maximum continuous chunk of non-whitespace lower-case characters. The resource contains all unique N-grams followed by number of occurrencies. The resource is a set of N-grams extracted from balanced [[http://nkjp.pl|National Corpus of Polish]] (300M tokens) for N from 1 to 5. Each unigram is maximum continuous chunk of non-whitespace lower-case characters. The resource contains all unique N-grams followed by number of occurrencies.

N-grams from balanced National Corpus of Polish

The resource is a set of N-grams extracted from balanced National Corpus of Polish (300M tokens) for N from 1 to 5. Each unigram is maximum continuous chunk of non-whitespace lower-case characters. The resource contains all unique N-grams followed by number of occurrencies.

Downloads