Differences between revisions 4 and 7 (spanning 3 versions)
Size: 354
Comment:
|
Size: 502
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 5: | Line 5: |
== Download == | == Downloads == * [[attachment:1grams.gz]] * [[attachment:2grams.gz]] * [[attachment:3grams.gz]] * [[attachment:4grams.gz]] * [[attachment:5grams.gz]] |
N-grams from balanced National Corpus of Polish
The resource is a set of N-grams extracted from balanced National Corpus of Polish for N from 1 to 5. Each unigram is maximum continuous chunk of non-whitespace lower-case characters. The resource contains all unique N-grams followed by number of occurrencies.