Locked History Actions

Diff for "LemmaPL"

Differences between revisions 3 and 4
Revision 3 as of 2014-12-18 12:49:46
Size: 1151
Comment:
Revision 4 as of 2014-12-18 13:08:50
Size: 1327
Comment:
Deletions are marked like this. Additions are marked like this.
Line 6: Line 6:
 * [[http://sgjp.pl/|Morfeusz analyzer]] (version 1 and 2)
 * [[http://nlp.pwr.wroc.pl/redmine/projects/wcrft/wiki|WCRFT tagger]],
 * [[http://zil.ipipan.waw.pl/Spejd|Spejd parser]],
 * [[http://sgjp.pl/|Morfeusz analyzer]] (version 1 and 2), by Marcin Woliński,
 * [[http://nlp.pwr.wroc.pl/redmine/projects/wcrft/wiki|WCRFT tagger]], by Adam Radziszewski,
 * [[http://zil.ipipan.waw.pl/Spejd|Spejd parser]], by Bartosz Zaborowski and Adam Przepiórkowski,
 * Spejd grammar, by Katarzyna Głowińska, Łukasz Degórski and Piotr Przybyła,

LemmaPL

LemmaPL is a lemmatization tool, which uses several existing tools and resources to provide higher than state-of-the-art lemmatization performance for Polish. Specifically, the following tools are used:

  • Morfeusz analyzer (version 1 and 2), by Marcin Woliński,

  • WCRFT tagger, by Adam Radziszewski,

  • Spejd parser, by Bartosz Zaborowski and Adam Przepiórkowski,

  • Spejd grammar, by Katarzyna Głowińska, Łukasz Degórski and Piotr Przybyła,
  • abbreviations dictionary,
  • frequency data from National Corpus of Polish.

Author: Łukasz Kobyliński
License: GPL

Usage

LemmaPL is available in a form of a web service (SOON).

Currently, LemmaPL can be used from a Docker container: ipipan/langtools-all or ipipan/langtools-taggers (with your own WCRFT model attached to the container).

Instructions for ipipan/langtools-all image:

  • docker pull ipipan/langtools-all
  • docker run -v /home/username/my_tests:/root/my_tests -it ipipan/langtools-all /bin/bash

inside container:

  • cd /root/lemmapl
  • python lemmapl.py ../my_tests/test.txt