Locked History Actions

LemmaPL

LemmaPL

LemmaPL is a lemmatization tool, which uses several existing tools and resources to provide higher than state-of-the-art lemmatization performance for Polish. Specifically, the following tools are used:

  • Morfeusz analyzer (version 1 and 2), by Marcin Woliński,

  • WCRFT tagger, by Adam Radziszewski,

  • Spejd parser, by Bartosz Zaborowski and Adam Przepiórkowski,

  • Spejd grammar, by Katarzyna Głowińska, Łukasz Degórski and Piotr Przybyła,
  • abbreviations dictionary,
  • frequency data from National Corpus of Polish.

Author: Łukasz Kobyliński
License: GPL

Usage

LemmaPL is available in a form of a web service (SOON).

Currently, LemmaPL can be used from a Docker container: ipipan/langtools-all or ipipan/langtools-taggers (with your own WCRFT model attached to the container).

Instructions for ipipan/langtools-all image:

  • docker pull ipipan/langtools-all
  • docker run -v /home/username/my_tests:/root/my_tests -it ipipan/langtools-all /bin/bash

inside container:

  • cd /root/lemmapl
  • python lemmapl.py ../my_tests/test.txt