Locked History Actions

SAWA

Grammatical Lexicon of Warsaw Urban Proper Names

The Grammatical Lexicon of Warsaw Urban Proper Names (SAWASłownik elektroniczny nAzewnictwa WArszawy) is an electronic lexicon containing about 9,000 proper names of places related to the Warsaw transportation system, i.e. names of streets, squares, monuments, buildings, bus, tram and subway stops, etc., as well as names of persons to whom some objects (notably streets) are dedicated. Stylistically marked names (e.g. Czterech Śpiących), as well as previous names (notably those used before 1989) are also included. Their morphosyntax is described by over 450 graph-based inflection paradigms, which allow an automatic generation of about 300,000 inflectional and syntactic variants. It has been developed within a French-Polish Polonium project and within a nationally funded Polish project.

Some aspects of its construction, contents and use have been described in:

The lexicon contains names of objects with the following distribution:

  • 4837 communication ways: streets (e.g. ulica Generała Kazimierza Pułaskiego), squares (e.g. Plac Komuny Paryskiej), and bridges (e.g. most Śląsko-Dąbrowski),

  • 1933 communication points: bus, tram, subway and city train stops (e.g. przystanek Aleja Zjednoczenia, stacja Warszawa-ZOO), railway stations (e.g. Warszawa Wschodnia), and airports (e.g. Port Lotniczy imienia Fryderyka Chopina w Warszawie),

  • 435 buildings (e.g. Hala Marymoncka, kościół Świętego Jakuba Apostoła, Muzeum Historii Żydów Polskich, teatr „Kwadrat”, Akademia Medyczna)

  • 385 districts and areas (e.g. Sady Żoliborskie, Stegny, park imienia Romualda Traugutta, Cmentarz Ewangelicko-Augsburski),

  • 115 monuments (e.g. Grób Nieznanego Żołnierza),

  • 34 hydronyms (e.g. Kanał Żerański),

  • 1195 persons to whom some urban objects (notably streets) are dedicated (e.g. Kazimierz Pułaski).

Authors

  • Małgorzata Marciniak – project management,

  • Celina Heliasz <celina DOT heliasz AT SPAMFREE uw DOT edu DOT pl> – lexicography,

  • Joanna Rabiega-Wiśniewska – lexicography,
  • Piotr Sikora -- programming,
  • Marcin Woliński – morphology of single words,

  • Agata Savary – automatic inflection and validation.

Tools

The lexicon has been created within Toposław, a tool for developping and managing inflectional dictionaries of multi-word units. Toposław integrates:

  • Morfeusz SGJP – a morphological analyser and generator of Polish,

  • Multiflex – a morpho-syntactic generator of multi-word units,

  • a graph editor stemming from Unitex.

License

The data are available under the CC BY-SA license.

Available resources

  • The binary source file in Toposław format,

  • Multiflex-compatible archive containing:

    • a list of morphologically annotated lexemes,
    • a list of corresponding inflected forms and variants,
    • inflection graphs compatible with the Unitex graph editor,

    • a list of known problems.

Future work

Defining an LMF format for the lexicon.