Locked History Actions

Diff for "plTAG"

Differences between revisions 7 and 8
Revision 7 as of 2013-01-23 11:05:19
Size: 2138
Comment:
Revision 8 as of 2013-03-21 14:48:49
Size: 2145
Comment:
Deletions are marked like this. Additions are marked like this.
Line 16: Line 16:
The package [[attachment:pl-TAG]] contains: The package [[attachment:pl-TAG.tar.gz]] contains:

Polish TAG Grammar

This is a TAG (Tree Adjoining Grammar) grammar for Polish. The description of TAG formalism can be found in this paper: http://www.seas.upenn.edu/~joshi/joshi-schabes-tag-97.pdf. The Tree Adjoining Grammar for Polish has been extracted automatically from Składnica - a Polish constituency treebank. The extraction procedure was based on the one described in this paper: http://nlp.cs.nyu.edu/nycnlp/autoextract.ps.

Author: Katarzyna Krasnowska
License: GPL v3

The grammar can be used with TuLiPA-pl - a modified version of TuLiPA (https://sourcesup.cru.fr/tulipa/) which is included with the grammar as a Java jar file. More infomation on the usage of TuLiPA-pl can be found in the README file. The grammar follows the 3-layer design adopted by the authors of TuLiPA (grammar, lexicon, morphology), but provides only the two first layers. The morphology can be either provided by the user or generated by TuLiPA-pl during parsing (using the Morfeusz morphological analyser).

The grammar and lexicon are in XMG (http://wiki.loria.fr/wiki/XMG/Documentation) and LEX2ALL (http://wiki.loria.fr/wiki/LEX2ALL) formats respectively. The grammar contains 2802 elementary tree families (1825 initial trees and 977 auxiliary trees). The lexicon contains 11515 lexemes, anchoring a total of 23399 trees (one lexeme can serve as a lexical anchor to more than one tree, e.g. in case of verbs with more than one possible valence frame).

Contents of the package

The package pl-TAG.tar.gz contains:

  • grammar/ directory which contains the TAG grammar for polish:
    • polish.mg - the grammar file in XMG metagrammar format
    • polish.xml - the same grammar in XML format, used by TuLiPA-pl
    • polish-lex - the lexicon file in LEX2ALL format
    • polish-lex.xml - the same lexicon in XML format, used by TuLiPA-pl
  • TuLiPA-pl.jar - a Java jar archive containing the parser
  • README file
  • licence text (GPL v3)