Revision 1 as of 2017-02-20 23:00:56

Clear message
Locked History Actions

SkładnicaMWE

SkładnicaMWE

SkładnicaMWE is a constinuency version of the Składnica treebank annotated with various types of multiword expressions. It has been created within the IC1207 COST action PARSEME.

Some aspects of its construction, contents and use have been described in:

  • SAVARY, A., WASZCZUK, J., (2017): "Projecting multiword expression resources on a Polish treebank", in the Proceedings of the 6th Workshop on

Balto-Slavic Natural Language Processing (BSNLP 2017), 4 April 2017, Valencia, Spain.

The lexicon contains about 3200 multi-word lexemes, 68,000 corresponding inflected forms, and 160 graph-based inflection paradigms, with the following distribution:

  • 3,705 nominal compounds (e.g. bajońskie sumy),

  • 422 adjectival compounds (e.g. prosty jak strzała, wprost proporcjonalny),

  • 609 adverbial compounds (e.g. chcąc nie chcąc),

  • 40 others (e.g. ni z gruszki, ni z pietruszki).

Authors

Tools

The lexicon has been created within Toposław, tool for developping and managing inflectional dictionaries of multi-word units. Toposław integrates:

  • Morfeusz SGJP -- a morphological analyser and generator of Polish,

  • Multiflex -- a morpho-syntactic generator of multi-word units,

  • graph editor stemming from Unitex.

License

The data are available under the CC BY-SA license.

Available resources

  • SEJF version 1.1
    • Slownik -- the binary source file in Toposław format

    • Multiflex-compatible archive including:

      • the list of morphologically annotated lexemes,
      • the list of corresponding inflected forms and variants,
      • inflection graphs compatible with Unitex graph editor,

      • list of known problems,
      • a README.txt file.
  • SEJF version 1.0 - 3200 multi-word lexemes (2121 nominal, 446 adjectival, 604 adverbial, 43 others), 68,000 corresponding inflected forms, and 160 graph-based inflection paradigms
    • Slownik -- the binary source file in Toposław format

    • Multiflex-compatible archive including:

      • the list of morphologically annotated lexemes,
      • the list of corresponding inflected forms and variants,
      • inflection graphs compatible with Unitex graph editor,

      • list of known problems,
      • a README.txt file.

Future work

Defining an LMF format for the lexicon.