Size: 1102
Comment:
|
Size: 1836
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
#acl +All:read Default | |
Line 11: | Line 12: |
TODO | See the [[https://github.com/kawu/concraft/blob/master/README.md#concraft|README]] file from the development repository. |
Line 15: | Line 16: |
Concraft is available in a form of a software distribution which can be downloaded from [[http://hackage.haskell.org/package/concraft|hackage]] using the [[http://www.haskell.org/cabal/|cabal]] tool. To compile Concraft you will also need the [[http://www.haskell.org/ghc/|Glasgow Haskell Compiler]] (GHC). The simplest way to get both cabal and GHC is to install the [[http://www.haskell.org/platform/|Haskell Platform]]. Please see the documentation for more information about the installation process. | Concraft is available in a form of a software distribution which can be downloaded from [[http://hackage.haskell.org/package/concraft|Hackage]] using the [[http://www.haskell.org/cabal/|Cabal]] tool. To compile Concraft you will also need the [[http://www.haskell.org/ghc/|Glasgow Haskell Compiler]] (GHC). The simplest way to get both Cabal and GHC is to install the [[http://www.haskell.org/platform/|Haskell Platform]]. Please see the documentation for more information about the installation process. === Pre-trained model === A [[attachment:model.bin|model]] for the Polish language has been trained on the [[http://clip.ipipan.waw.pl/LRT?action=AttachFile&do=view&target=NKJP-PodkorpusMilionowy-1.1.tgz|manually annotated subcorpus]] of the [[http://nkjp.pl/index.php?page=0&lang=1|National Corpus of Polish]]. The corpus has been first re-analysed with the [[http://nlp.pwr.wroc.pl/redmine/projects/libpltagger/wiki|Maca]] tool (using the `morfeusz-nkjp-official` configuration) and the same preprocessing pipeline should be used to prepare input data for morphosyntactic disambiguation. |
Line 19: | Line 24: |
* Jakub Waszczuk. (2012). [[http://nlp.ipipan.waw.pl/Spejd/PaPa2008.pdf|Harnessing the CRF complexity with domain-specific constraints. The case of morphosyntactic tagging of a highly inflected language]]. <<BR>> In: Proceedings of COLING 2012. | * Jakub Waszczuk. (2012). [[attachment:coling2012.pdf|Harnessing the CRF complexity with domain-specific constraints. The case of morphosyntactic tagging of a highly inflected language]]. <<BR>> In: Proceedings of COLING 2012, Mumbai, India. |
Concraft
This page provides the official release of Concraft, a morphosyntactic disambiguation tool based on constrained conditional random fields.
Author: Jakub Waszczuk
License: 2-clause BSD
Documentation
See the README file from the development repository.
Downloads
Concraft is available in a form of a software distribution which can be downloaded from Hackage using the Cabal tool. To compile Concraft you will also need the Glasgow Haskell Compiler (GHC). The simplest way to get both Cabal and GHC is to install the Haskell Platform. Please see the documentation for more information about the installation process.
Pre-trained model
A model for the Polish language has been trained on the manually annotated subcorpus of the National Corpus of Polish. The corpus has been first re-analysed with the Maca tool (using the morfeusz-nkjp-official configuration) and the same preprocessing pipeline should be used to prepare input data for morphosyntactic disambiguation.
Publications
Jakub Waszczuk. (2012). Harnessing the CRF complexity with domain-specific constraints. The case of morphosyntactic tagging of a highly inflected language.
In: Proceedings of COLING 2012, Mumbai, India.