LemmaPL is a lemmatization tool, which uses several existing tools and resources to provide higher than state-of-the-art lemmatization performance for Polish. Specifically, the following tools are used:
Morfeusz analyzer (version 1 and 2), by Marcin Woliński,
WCRFT tagger, by Adam Radziszewski,
Spejd parser, by Bartosz Zaborowski and Adam Przepiórkowski,
- Spejd grammar, by Katarzyna Głowińska, Łukasz Degórski and Piotr Przybyła,
- abbreviations dictionary,
- frequency data from National Corpus of Polish.
Author: Łukasz Kobyliński
LemmaPL is available in a form of a web service (SOON).
Currently, LemmaPL can be used from a Docker container: ipipan/langtools-all or ipipan/langtools-taggers (with your own WCRFT model attached to the container).
Instructions for ipipan/langtools-all image:
- docker pull ipipan/langtools-all
- docker run -v /home/username/my_tests:/root/my_tests -it ipipan/langtools-all /bin/bash
- cd /root/lemmapl
- python lemmapl.py ../my_tests/test.txt