Locked History Actions

Diff for "Concraft"

Differences between revisions 30 and 33 (spanning 3 versions)
Revision 30 as of 2013-11-20 13:02:01
Size: 2039
Comment:
Revision 33 as of 2019-09-04 08:56:18
Size: 1282
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
#acl +All:read Default #acl JakubWaszczuk:read,write,revert All:read
Line 4: Line 4:
This page provides the official release of Concraft-pl, a morphosyntactic tagger for Polish based on constrained conditional random fields. The tool combines the following components into a pipeline: Concraft-pl is a morphosyntactic tagger for the Polish language based on conditional random fields. The tool is coupled with [[http://sgjp.pl/morfeusz/index.html|Morfeusz]], a morphosyntactic analyzer for Polish, which represents both morphosyntactic and segmentation ambiguities in the form of a directed acyclic graph (DAG).
Line 6: Line 6:
 * A morphosyntactic segmentation and analysis tool [[http://nlp.pwr.wroc.pl/redmine/projects/libpltagger/wiki|Maca]],
 * A morphosyntactic disambiguation library [[https://github.com/kawu/concraft#concraft|Concraft]].

'''Author:'''
[[http://zil.ipipan.waw.pl/JakubWaszczuk|Jakub Waszczuk]] <<BR>>
'''License:''' 2-clause BSD

== Documentation ==

See the [[https://github.com/kawu/concraft-pl#concraft-pl|README]] file from the development repository.

== Downloads ==

Concraft-pl is available in a form of a software distribution which can be downloaded from [[http://hackage.haskell.org/package/concraft-pl|Hackage]] using the [[http://www.haskell.org/cabal/|Cabal]] tool. To compile Concraft-pl you will also need the [[http://www.haskell.org/ghc/|Glasgow Haskell Compiler]] (GHC). The simplest way to get both Cabal and GHC is to install the [[http://www.haskell.org/platform/|Haskell Platform]]. Please see the documentation for more information about the installation process.

=== Pre-trained model ===

We provide Concraft-pl models trained on the [[http://clip.ipipan.waw.pl/LRT?action=AttachFile&do=view&target=NKJP-PodkorpusMilionowy-1.1.tgz|manually annotated subcorpus]] of the [[http://nkjp.pl/index.php?page=0&lang=1|National Corpus of Polish]]. Choose appropriate model depending on the version of Concraft-pl you are using.

|| Version || Model ||
|| 0.1 || [[attachment:nkjp-model-0.1.gz|Download]] ||
|| 0.2 .. 0.6 || [[attachment:nkjp-model-0.2.gz|Download]] ||
More information can be found on the tool's website: [[https://github.com/kawu/concraft-pl]]
Line 31: Line 10:
 * Jakub Waszczuk. (2012). [[attachment:coling2012.pdf|Harnessing the CRF complexity with domain-specific constraints. The case of morphosyntactic tagging of a highly inflected language]]. <<BR>> In: Proceedings of COLING 2012, Mumbai, India.  * Jakub Waszczuk. (2012). [[attachment:coling2012.pdf|Harnessing the CRF complexity with domain-specific constraints. The case of morphosyntactic tagging of a highly inflected language]]. <<BR>> In: Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), pages 2789–2804, Mumbai, India, 2012.
 * Jakub Waszczuk, Witold Kieraś, and Marcin Woliński. (2018). [[https://hal.archives-ouvertes.fr/hal-01835573/document|Morphosyntactic disambiguation and segmentation for historical Polish with graph-based conditional random fields]]. <<BR>> In: Petr Sojka, Aleš Horák, Ivan Kopeček, and Karel Pala, editors, Text, Speech, and Dialogue: 21st International Conference, TSD 2018, Brno, Czech Republic, September 11-14, 2018.

Concraft-pl

Concraft-pl is a morphosyntactic tagger for the Polish language based on conditional random fields. The tool is coupled with Morfeusz, a morphosyntactic analyzer for Polish, which represents both morphosyntactic and segmentation ambiguities in the form of a directed acyclic graph (DAG).

More information can be found on the tool's website: https://github.com/kawu/concraft-pl

Publications