Size: 2039
Comment:
|
Size: 1282
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
#acl +All:read Default | #acl JakubWaszczuk:read,write,revert All:read |
Line 4: | Line 4: |
This page provides the official release of Concraft-pl, a morphosyntactic tagger for Polish based on constrained conditional random fields. The tool combines the following components into a pipeline: | Concraft-pl is a morphosyntactic tagger for the Polish language based on conditional random fields. The tool is coupled with [[http://sgjp.pl/morfeusz/index.html|Morfeusz]], a morphosyntactic analyzer for Polish, which represents both morphosyntactic and segmentation ambiguities in the form of a directed acyclic graph (DAG). |
Line 6: | Line 6: |
* A morphosyntactic segmentation and analysis tool [[http://nlp.pwr.wroc.pl/redmine/projects/libpltagger/wiki|Maca]], * A morphosyntactic disambiguation library [[https://github.com/kawu/concraft#concraft|Concraft]]. '''Author:''' [[http://zil.ipipan.waw.pl/JakubWaszczuk|Jakub Waszczuk]] <<BR>> '''License:''' 2-clause BSD == Documentation == See the [[https://github.com/kawu/concraft-pl#concraft-pl|README]] file from the development repository. == Downloads == Concraft-pl is available in a form of a software distribution which can be downloaded from [[http://hackage.haskell.org/package/concraft-pl|Hackage]] using the [[http://www.haskell.org/cabal/|Cabal]] tool. To compile Concraft-pl you will also need the [[http://www.haskell.org/ghc/|Glasgow Haskell Compiler]] (GHC). The simplest way to get both Cabal and GHC is to install the [[http://www.haskell.org/platform/|Haskell Platform]]. Please see the documentation for more information about the installation process. === Pre-trained model === We provide Concraft-pl models trained on the [[http://clip.ipipan.waw.pl/LRT?action=AttachFile&do=view&target=NKJP-PodkorpusMilionowy-1.1.tgz|manually annotated subcorpus]] of the [[http://nkjp.pl/index.php?page=0&lang=1|National Corpus of Polish]]. Choose appropriate model depending on the version of Concraft-pl you are using. || Version || Model || || 0.1 || [[attachment:nkjp-model-0.1.gz|Download]] || || 0.2 .. 0.6 || [[attachment:nkjp-model-0.2.gz|Download]] || |
More information can be found on the tool's website: [[https://github.com/kawu/concraft-pl]] |
Line 31: | Line 10: |
* Jakub Waszczuk. (2012). [[attachment:coling2012.pdf|Harnessing the CRF complexity with domain-specific constraints. The case of morphosyntactic tagging of a highly inflected language]]. <<BR>> In: Proceedings of COLING 2012, Mumbai, India. | * Jakub Waszczuk. (2012). [[attachment:coling2012.pdf|Harnessing the CRF complexity with domain-specific constraints. The case of morphosyntactic tagging of a highly inflected language]]. <<BR>> In: Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), pages 2789–2804, Mumbai, India, 2012. * Jakub Waszczuk, Witold Kieraś, and Marcin Woliński. (2018). [[https://hal.archives-ouvertes.fr/hal-01835573/document|Morphosyntactic disambiguation and segmentation for historical Polish with graph-based conditional random fields]]. <<BR>> In: Petr Sojka, Aleš Horák, Ivan Kopeček, and Karel Pala, editors, Text, Speech, and Dialogue: 21st International Conference, TSD 2018, Brno, Czech Republic, September 11-14, 2018. |
Concraft-pl
Concraft-pl is a morphosyntactic tagger for the Polish language based on conditional random fields. The tool is coupled with Morfeusz, a morphosyntactic analyzer for Polish, which represents both morphosyntactic and segmentation ambiguities in the form of a directed acyclic graph (DAG).
More information can be found on the tool's website: https://github.com/kawu/concraft-pl
Publications
Jakub Waszczuk. (2012). Harnessing the CRF complexity with domain-specific constraints. The case of morphosyntactic tagging of a highly inflected language.
In: Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), pages 2789–2804, Mumbai, India, 2012.Jakub Waszczuk, Witold Kieraś, and Marcin Woliński. (2018). Morphosyntactic disambiguation and segmentation for historical Polish with graph-based conditional random fields.
In: Petr Sojka, Aleš Horák, Ivan Kopeček, and Karel Pala, editors, Text, Speech, and Dialogue: 21st International Conference, TSD 2018, Brno, Czech Republic, September 11-14, 2018.