#acl +All:read Default = SkładnicaMWE = SkładnicaMWE is a constituency version of the [[http://zil.ipipan.waw.pl/Składnica|Składnica]] treebank annotated with various types of multiword expressions. It was created within the PhD thesis work by Jakub Waszczuk, and partly funded by the IC1207 COST action [[http://www.parseme.eu|PARSEME]]. Some aspects of its construction, contents and use have been described in: * SAVARY, A., WASZCZUK, J., (2017): "[[http://aclweb.org/anthology/W/W17/W17-1404.pdf|Projecting multiword expression resources on a Polish treebank]]", in the Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing ([[http://aclweb.org/anthology/W/W17/W17-1404.pdf|BSNLP 2017]]), 4 April 2017, Valencia, Spain. The pre-annotation was performed by automatically projecting 3 Polish MWE resources: * named entity layer of the [[http://clip.ipipan.waw.pl/NationalCorpusOfPolish|National Corpus of Polish]], * [[http://zil.ipipan.waw.pl/SEJF|SEJF]], the grammatical lexicon of Polish nominal, adjectival and adverbial MWEs, * [[http://zil.ipipan.waw.pl/Walenty|Walenty]], a Polish valence dictionary with phraseological component (2015 version). All automatic pre-annotation results were manually validated. The treebank contains about 2,000 MWE annotations in about 9,000 constituency trees, with the following distribution: * 1,304 multiword named entities (e.g. ''Buenos Aires'', ''Ministerstwo Pracy i Polityki Socjalnej''), * 368 nominal, adjectival and adverbial compounds (e.g. ''związki zawodowe'', ''jedyny w swoim rodzaju'', ''przede wszystkim''), * 365 verbal MWEs (e.g. ''wejść w życie'', ''pominąć milczeiem'', ''zależć za skórę'', ''udzielić rady''). == Authors == * [[http://zil.ipipan.waw.pl/JakubWaszczuk|Jakub Waszczuk]] * [[http://www.info.univ-tours.fr/~savary/English/indexgb.html|Agata Savary]] == License == The data are available under the [[https://www.gnu.org/licenses/gpl-3.0.en.html|GPLv3 license]]. == Available resources == * [[attachment:SkladnicaMWE-1.0.zip|SkładnicaMWE v 1.0]] -- a version of Składnica (containing only the correct parses) with MWE annotations, in a custom XML format. Token identifiers are compatible with the original Składnica corpus. == Future work == * Repeating the automatic mapping and manual validation with more recent versions of [[http://zil.ipipan.waw.pl/Składnica|Składnica]] and of [[http://zil.ipipan.waw.pl/Walenty|Walenty]]. * Enhancing the lexicon projection to include more fine-grained syntactic constraints. * Enhancing the annotation schema towards a standard format. * Linking the MWE occurrences in the treebank with their entries in lexicons.