Concraft-pl
This page provides the official release of Concraft-pl, a morphosyntactic tagger for Polish based on constrained conditional random fields. The tool combines the following components into a pipeline:
A morphosyntactic segmentation and analysis tool Maca,
A morphosyntactic disambiguation library Concraft.
Author: Jakub Waszczuk
License: 2-clause BSD
Documentation
See the README file from the development repository.
Downloads
Concraft-pl is available in a form of a software distribution which can be downloaded from Hackage using the Cabal tool. To compile Concraft-pl you will also need the Glasgow Haskell Compiler (GHC). The simplest way to get both Cabal and GHC is to install the Haskell Platform. Please see the documentation for more information about the installation process.
Pre-trained model
We provide Concraft-pl models trained on the manually annotated subcorpus of the National Corpus of Polish. Choose appropriate model depending on the version of Concraft-pl you are using.
Publications
Jakub Waszczuk. (2012). Harnessing the CRF complexity with domain-specific constraints. The case of morphosyntactic tagging of a highly inflected language.
In: Proceedings of COLING 2012, Mumbai, India.