Revision 4 as of 2012-07-20 09:57:35

Clear message
Locked History Actions

SAWA

Grammatical Lexicon of Warsaw Urban Proper Names

The Grammatical Lexicon of Warsaw Urban Proper Names (SAWA - Słownik elektroniczny nAzewnictwa WArszawy) an electronic lexicon containing about 9,000 proper names of places related to the Warsaw transportation system, i.e. names of streets, squares, monuments, buildings, bus, tram and subway stops, etc., as well as names of persons to whom some objects (notably streets) are dedicated. Previous names (notably those used before 1989) are also included. Their morphosyntax is described by over 450 graph-based inflection paradigms, which allow an automatic generation of about 300,000 inflectional and syntactic variants. It has been developed within a French-Polish Polonium project and within nationally funded Polish project.

Some aspects of its construction, contents and use have been described in:

  • SAVARY, A., RABIEGA-WIŚNIEWSKA, J., WOLIŃSKI, M. (2009): "Inflection of Polish Multi-Word Proper Names with Morfeusz and Multiflex", in MARCINIAK, M., MYKOWIECKA, A. (eds.) "Aspects of Natural Language Processing", Lecture Notes in Computer Science 5070, Springer Verlag, pp. 111-141.
  • MARCINIAK, M., RABIEGA-WIŚNIEWSKA, J., SAVARY, A., WOLIŃSKI, M., HELIASZ, C. (2009): "Constructing an Electronic Dictionary of Polish Urban Proper Names", in Recent Advances in Intelligent Information Systems (Proceedings of the Balto-Slavonic Natural Language Processing Workshop, Kraków), Academic Publishing House EXIT, Warsaw, pp. 743-749.

The lexicon contains the following names of objects of the following types:

  • 4837 communication ways: streets (e.g. ulica Generała Kazimierza Pułaskiego), squares (e.g. Plac Komuny Paryskiej), bridges (e.g. most Śląsko-Dąbrowski),

  • squares (e.g. prosty jak strzała, wprost proporcjonalny),

  • (e.g. chcąc nie chcąc),

  • 1195 person names to whom some urban objects (notably streets) are dedicated (e.g. Kazimierz Pułaski).

    • 435 SAWA-budowle-dlc.dic
    • 8934 SAWA-dlc.dic
    • 618756 SAWA-dlcf.dic
      • 4837 SAWA-drogi-dlc.dic
        • 34 SAWA-hydronimy-dlc.dic
        • 385 SAWA-obszary-dlc.dic
        1195 SAWA-osoby-dlc.dic
        • 115 SAWA-pomniki-dlc.dic
        1933 SAWA-punkty-komunikacyjne-dlc.dic

Authors

Tools

The lexicon has been created within Toposław, tool for developping and managing inflectional dictionaries of multi-word units. Topsław integrates:

  • Morfeusz SGJP -- a morphological analyser and generator of Polish,

  • Multiflex -- a morpho-syntactic generator of multi-word units,

  • graph editor stemming from Unitex.

License

The data are available under the CC-BY-SA license.

Available resources

  • Slownik -- the binary source file in Toposław format

  • Multiflex-compatible archive containing:

    • the list of morphologically annotated lexemes,
    • the list of corresponding inflected forms and variants,
    • inflection graphs compatible with Unitex graph editor,

    • list of known problems.

Future work

Defining an LMF format for the lexicon.