Differences between revisions 8 and 9
|
Size: 526
Comment:
|
Size: 540
Comment:
|
| Deletions are marked like this. | Additions are marked like this. |
| Line 4: | Line 4: |
| The resource is a set of N-grams extracted from balanced [[http://nkjp.pl|National Corpus of Polish]] for N from 1 to 5. Each unigram is maximum continuous chunk of non-whitespace lower-case characters. The resource contains all unique N-grams followed by number of occurrencies. | The resource is a set of N-grams extracted from balanced [[http://nkjp.pl|National Corpus of Polish]] (300M tokens) for N from 1 to 5. Each unigram is maximum continuous chunk of non-whitespace lower-case characters. The resource contains all unique N-grams followed by number of occurrencies. |
N-grams from balanced National Corpus of Polish
The resource is a set of N-grams extracted from balanced National Corpus of Polish (300M tokens) for N from 1 to 5. Each unigram is maximum continuous chunk of non-whitespace lower-case characters. The resource contains all unique N-grams followed by number of occurrencies.
