Grammatical Lexicon of Warsaw Urban Proper Names
The Grammatical Lexicon of Warsaw Urban Proper Names (SAWA - Słownik elektroniczny nAzewnictwa WArszawy) an electronic lexicon containing about 9,000 proper names of places related to the Warsaw transportation system, i.e. names of streets, squares, monuments, buildings, bus, tram and subway stops, etc., as well as names of persons to whom some objects (notably streets) are dedicated. Previous names (notably those used before 1989) are also included. Their morphosyntax is described by over 450 graph-based inflection paradigms, which allow an automatic generation of about 300,000 inflectional and syntactic variants. It has been developed within a French-Polish Polonium project and within nationally funded Polish project.
Some aspects of its construction, contents and use have been described in:
- SAVARY, A., RABIEGA-WIŚNIEWSKA, J., WOLIŃSKI, M. (2009): "Inflection of Polish Multi-Word Proper Names with Morfeusz and Multiflex", in MARCINIAK, M., MYKOWIECKA, A. (eds.) "Aspects of Natural Language Processing", Lecture Notes in Computer Science 5070, Springer Verlag, pp. 111-141.
- MARCINIAK, M., RABIEGA-WIŚNIEWSKA, J., SAVARY, A., WOLIŃSKI, M., HELIASZ, C. (2009): "Constructing an Electronic Dictionary of Polish Urban Proper Names", in Recent Advances in Intelligent Information Systems (Proceedings of the Balto-Slavonic Natural Language Processing Workshop, Kraków), Academic Publishing House EXIT, Warsaw, pp. 743-749.
The lexicon contains the following names of objects of the following types:
4837 communication ways: streets (e.g. ulica Generała Kazimierza Pułaskiego), squares (e.g. Plac Komuny Paryskiej), bridges (e.g. most Śląsko-Dąbrowski),
squares (e.g. prosty jak strzała, wprost proporcjonalny),
(e.g. chcąc nie chcąc),
1195 person names to whom some urban objects (notably streets) are dedicated (e.g. Kazimierz Pułaski).
- 435 SAWA-budowle-dlc.dic
- 8934 SAWA-dlc.dic
- 618756 SAWA-dlcf.dic
- 4837 SAWA-drogi-dlc.dic
- 34 SAWA-hydronimy-dlc.dic
- 385 SAWA-obszary-dlc.dic
- 115 SAWA-pomniki-dlc.dic
- 4837 SAWA-drogi-dlc.dic
Authors
Monika Czerepowicka - lexicography
Agata Savary - automatic inflection and validation
Tools
The lexicon has been created within Toposław, tool for developping and managing inflectional dictionaries of multi-word units. Topsław integrates:
Morfeusz SGJP -- a morphological analyser and generator of Polish,
Multiflex -- a morpho-syntactic generator of multi-word units,
graph editor stemming from Unitex.
License
The data are available under the CC-BY-SA license.
Available resources
Multiflex-compatible archive containing:
- the list of morphologically annotated lexemes,
- the list of corresponding inflected forms and variants,
inflection graphs compatible with Unitex graph editor,
- list of known problems.
Future work
Defining an LMF format for the lexicon.