Locked History Actions

Diff for "NKJP model for TnT Tagger"

Differences between revisions 4 and 7 (spanning 3 versions)
Revision 4 as of 2013-01-29 17:36:48
Size: 764
Comment:
Revision 7 as of 2013-01-29 18:26:00
Size: 857
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
## page was renamed from TnT
Line 3: Line 4:
= TnT = = NKJP model for TnT Tagger =
Line 5: Line 6:
Tu można pobrać utworzony z ręcznie anotowanego podkorpusu milionowego NKJP model dla taggera [[http://www.coli.uni-saarland.de/~thorsten/tnt/ | TnT tagger]]. [[attachment:nkjp.zip|Model]] udostępniony jest na licencji w stylu BSD. Pobrany plik należy zdekompresować programem obsługującym format zip. Here you can download the model for [[http://www.coli.uni-saarland.de/~thorsten/tnt/ | TnT tagger]], which was created by training the tagger with the one-million manually annotated subcorpus of the Polish National Corpus. The [[attachment:nkjp.zip|model]] is available under BSD license. The downloaded file has to be unzipped.
Line 7: Line 8:
Uwaga: aby skorzystać z taggera, należy uzyskać jego kopię od autora. Uruchomienie taggera: To use the TnT tagger, you need a copy from its author (see [[http://www.coli.uni-saarland.de/~thorsten/tnt/ | his webpage]]). To run the tagger, use:
Line 11: Line 12:
Plik wejściowy musi być wcześniej podzielony na zdania (oddzielone dwoma końcami wiersza) i wyrazy zgodnie z regułami podziału stosowanymi w korpusie NKJP. Tagger uzyskuje średni wynik około 88% poprawnych znaczników; nie dokonuje lematyzacji. The input file needs to be tokenized into sentences (separated with two end-of-line characters) and words in the way the Polish National Corpus is tokenized. (Use Morfeusz if unsure). The tagger's quality is around 88%.

NKJP model for TnT Tagger

Here you can download the model for TnT tagger, which was created by training the tagger with the one-million manually annotated subcorpus of the Polish National Corpus. The model is available under BSD license. The downloaded file has to be unzipped.

To use the TnT tagger, you need a copy from its author (see his webpage). To run the tagger, use:

tnt nkjp <nazwa_pliku>

The input file needs to be tokenized into sentences (separated with two end-of-line characters) and words in the way the Polish National Corpus is tokenized. (Use Morfeusz if unsure). The tagger's quality is around 88%.