#acl +All:read Default

SkładnicaMWE

SkładnicaMWE is a constituency version of the [[http://zil.ipipan.waw.pl/Składnica|Składnica]] treebank annotated with various types of multiword expressions. It was created within the PhD thesis work by Jakub Waszczuk, and partly funded by the IC1207 COST action [[http://www.parseme.eu|PARSEME]].

Some aspects of its construction, contents and use have been described in:
 * SAVARY, A., WASZCZUK, J., (2017): "[[http://aclweb.org/anthology/W/W17/W17-1404.pdf|Projecting multiword expression resources on a Polish treebank]]", in the Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing ([[http://aclweb.org/anthology/W/W17/W17-1404.pdf|BSNLP 2017]]), 4 April 2017, Valencia, Spain.

The pre-annotation was performed by automatically projecting 3 Polish MWE resources:
 * named entity layer of the [[http://clip.ipipan.waw.pl/NationalCorpusOfPolish|National Corpus of Polish]],
 * [[http://zil.ipipan.waw.pl/SEJF|SEJF]], the grammatical lexicon of Polish nominal, adjectival and adverbial MWEs,
 * [[http://zil.ipipan.waw.pl/Walenty|Walenty]], a Polish valence dictionary with phraseological component (2015 version).

All automatic pre-annotation results were manually validated.

The treebank contains about 2,000 MWE annotations in about 9,000 constituency trees, with the following distribution:
 * 1,304 multiword named entities (e.g. ''Buenos Aires'', ''Ministerstwo Pracy i Polityki Socjalnej''),
 * 368 nominal, adjectival and adverbial compounds (e.g. ''związki zawodowe'', ''jedyny w swoim rodzaju'', ''przede wszystkim''),
 * 365 verbal MWEs (e.g. ''wejść w życie'', ''pominąć milczeiem'', ''zależć za skórę'', ''udzielić rady'').

== Authors ==
 * [[http://zil.ipipan.waw.pl/JakubWaszczuk|Jakub Waszczuk]]
 * [[http://www.info.univ-tours.fr/~savary/English/indexgb.html|Agata Savary]]

== License ==
The data are available under the [[https://www.gnu.org/licenses/gpl-3.0.en.html|GPLv3 license]].

== Available resources ==
 * [[attachment:SkladnicaMWE-1.0.zip|SkładnicaMWE v 1.0]] -- a version of Składnica (containing only the correct parses) with MWE annotations, in a custom XML format. Token identifiers are compatible with the original Składnica corpus.

== Future work ==
 * Repeating the automatic mapping and manual validation with more recent versions of [[http://zil.ipipan.waw.pl/Składnica|Składnica]] and of [[http://zil.ipipan.waw.pl/Walenty|Walenty]].
 * Enhancing the lexicon projection to include more fine-grained syntactic constraints.
 * Enhancing the annotation schema towards a standard format.
 * Linking the MWE occurrences in the treebank with their entries in lexicons.