Revision 9 as of 2013-01-25 15:20:35

Clear message
Locked History Actions

NKJPNGrams

N-grams from balanced National Corpus of Polish

The resource is a set of N-grams extracted from balanced National Corpus of Polish (300M tokens) for N from 1 to 5. Each unigram is maximum continuous chunk of non-whitespace lower-case characters. The resource contains all unique N-grams followed by number of occurrencies.

Downloads