Locked History Actions

Diff for "NKJPNGrams"

Differences between revisions 2 and 10 (spanning 8 versions)
Revision 2 as of 2012-08-01 10:26:07
Size: 45
Comment:
Revision 10 as of 2013-05-20 16:10:44
Size: 578
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
## page was renamed from NGrams
= NGrams =
#acl +All:read Default
= N-grams from the balanced subcorpus of the National Corpus of Polish =

The resource is a set of N-grams extracted from the balanced subcorpus of [[http://nkjp.pl|National Corpus of Polish]] (300M tokens) for N from 1 to 5. Each unigram is maximum continuous chunk of non-whitespace lower-case characters. The resource contains all unique N-grams followed by number of occurrencies.

== Downloads ==

 * [[attachment:1grams.gz]]
 * [[attachment:2grams.gz]]
 * [[attachment:3grams.gz]]
 * [[attachment:4grams.gz]]
 * [[attachment:5grams.gz]]

N-grams from the balanced subcorpus of the National Corpus of Polish

The resource is a set of N-grams extracted from the balanced subcorpus of National Corpus of Polish (300M tokens) for N from 1 to 5. Each unigram is maximum continuous chunk of non-whitespace lower-case characters. The resource contains all unique N-grams followed by number of occurrencies.

Downloads