Locked History Actions


Polish TAG Grammar

This is a TAG (Tree Adjoining Grammar) grammar for Polish. The description of TAG formalism can be found in this paper: http://www.seas.upenn.edu/~joshi/joshi-schabes-tag-97.pdf. The Tree Adjoining Grammar for Polish has been extracted automatically from Składnica - a Polish constituency treebank. The extraction procedure was based on the one described in this paper: http://nlp.cs.nyu.edu/nycnlp/autoextract.ps.

Author: Katarzyna Krasnowska
License: GPL v3

The grammar can be used with TuLiPA-pl - a modified version of TuLiPA (https://sourcesup.cru.fr/tulipa/) which is included with the grammar as a Java jar file. More infomation on the usage of TuLiPA-pl can be found in the README file. The grammar follows the 3-layer design adopted by the authors of TuLiPA (grammar, lexicon, morphology), but provides only the two first layers. The morphology can be either provided by the user or generated by TuLiPA-pl during parsing (using the Morfeusz morphological analyser).

The grammar and lexicon are in XMG (http://wiki.loria.fr/wiki/XMG/Documentation) and LEX2ALL (http://wiki.loria.fr/wiki/LEX2ALL) formats respectively. The grammar contains 2802 elementary tree families (1825 initial trees and 977 auxiliary trees). The lexicon contains 11515 lexemes, anchoring a total of 23399 trees (one lexeme can serve as a lexical anchor to more than one tree, e.g. in case of verbs with more than one possible valence frame).

Contents of the package

The package pl-TAG.tar.gz contains:

  • grammar/ directory which contains the TAG grammar for polish:
    • polish.mg - the grammar file in XMG metagrammar format
    • polish.xml - the same grammar in XML format, used by TuLiPA-pl
    • polish-lex - the lexicon file in LEX2ALL format
    • polish-lex.xml - the same lexicon in XML format, used by TuLiPA-pl
  • TuLiPA-pl.jar - a Java jar archive containing the parser
  • README file
  • licence text (GPL v3)