Locked History Actions

Diff for "CoDeS"

Differences between revisions 75 and 99 (spanning 24 versions)
Revision 75 as of 2019-12-12 15:44:26
Size: 5752
Comment:
Revision 99 as of 2021-08-26 14:41:14
Size: 8000
Comment:
Deletions are marked like this. Additions are marked like this.
Line 46: Line 46:
[[attachment:FigAN.xlsx|FigAN|&do-get]]: a list of adjective-noun phrases labelled with one of three categories : L (literal phrase), M (non-literal phrases), B (ambiguous phrases) [[attachment:FigAN.xlsx|FigAN|&do-get]]:   a list of 1526 adjective-noun phrases labelled with one of three categories : L (literal phrase), M (non-literal phrase), B (ambiguous phrase)
Line 48: Line 48:
Metaphors corpus annotated by Joanna Marchula and Maciej Rosiński. It contains two versions ([[attachment:Korpus_9_11.zip|9_11|&do-get]] and [[attachment:Korpus_16_09|16_09|&do-get]]) of word-level anotations. You may likely want to work with the final annotations only ([[attachment:Korpus_9_11.zip|9_11|&do-get]]). Metaphors are marked for selected types of part-of-speech tags in a column named "Metafory". Metaphorically used words are marked as "M", literally used words as "L". More details can be found in the (LTC 2019) paper listed below. [[attachment:FigSen-1.xlsx|FigSen-1|&do-get]]: 1833 short fragments of text selected from the NKJP (National Corpus of Polish,
(Przepiórkowski et al., 2012)) in which all grammatically correct occurrences of all adjective-noun phrases were annotated at the phrase level either as literal (L) or figurative (
M).


[[attachment:Korpus_9_11.zip|FigSen-2|&do-get]]: Word-level m
etaphors corpus annotated by Joanna March_la and Maciej Rosiński. The same 1833 fragments with alternative annotation. It contains two versions ([[attachment:Korpus_9_11.zip|FigSen-2_9_11|&do-get]] and [[attachment:Korpus_16_09|FigSen-2_16_09|&do-get]]) of word-level annotations. You may likely want to work with the final annotations only (9_11). Metaphors are marked for selected types of part-of-speech tags in a column named "Metafory". Metaphorically used words are marked as "M", literally used words as "L". More details can be found in the (LTC 2019) paper listed below.
Line 53: Line 58:
Python [[attachment:gibber-master.zip|package|&do-get]] for the WSD method presented in (A. Mykowiecks, P. Rychlik, Sz. Rutkowski, 2019). Python [[attachment:gibber-master.zip|package|&do-get]] for the WSD method presented in (Rutkowski, Sz., P. Rychlik, and A. Mykowiecka, 2019).
Line 59: Line 64:
 * Mykowiecka, A., M. Marciniak and P. Rychlik (2016) [[https://aclweb.org/anthology/W/W16/W16-4703.pdf|Recognition of non-domain phrases in automatically extracted lists of terms]], Proceedings of the 5th International Workshop on Computational Terminology Computerm2016  * Mykowiecka, A., M. Marciniak (2019) [[https://aclanthology.org/D19-6207.pdf | Experiments with ad hoc ambiguous abbreviation expansion]]. In Eben Holderness, Antonio Jimeno Yepes, Alberto Lavelli, Anne-Lyse Minard, James Pustejovsky, and Fabio Rinaldi, editors, Proceedings of the Tenth International Workshop on Health Text Mining and Information Analysis (LOUHI 2019), pages 44–53, Hong Kong, 2019. Association for Computational Linguistics
Line 61: Line 66:
 * Mykowiecka, A., M. Marciniak (2020), [[https://www.aclweb.org/anthology/2020.lrec-1.719 | Are White Ravens Ever White? - Non-Literal Adjective-Noun Phrases in Polish]], Proceedings of the 12th Language Resources and Evaluation Conference (LREC)

 * Wawer, A., M. Marciniak and A. Mykowiecka (2019) [[http://nlp.ipipan.waw.pl/Bib/ma:wa:my:2019.pdf | Detecting word level metaphors in Polish]], Proceedings of the 9th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics (LTC 2019).

 * Rutkowski, Sz., P. Rychlik, and A. Mykowiecka. Estimating senses with sets of lexically related words for Polish word sense disambiguation, Proceedings of the 10th Global WordNet Conference (GWC 2019).

 * Mykowiecka, A., M. Marciniak, and A. Wawer. [[https://www.aclweb.org/anthology/W18-09.pdf| Literal, metaphorical or both? Detecting metaphoricity in isolated adjective-noun phrases]]. In Beata Beigman Klebanov, Ekaterina Shutova, Patricia Lichtenstein, Smaranda Muresan, and Chee Wee, editors, Proceedings of the Workshop on Figurative Language Processing, pages 27–33. Association for Computational Linguistics, 2018.
 
 * Mykowiecka, A., A.Wawer, and M. Marciniak. [[https://www.aclweb.org/anthology/W18-09.pdf|Detecting figurative word occurrences using recurrent neural network]]s. In Beata Beigman Klebanov, Ekaterina Shutova, Patricia Lichtenstein, Smaranda Muresan, and Chee Wee, editors, Proceedings of the Workshop on Figurative Language Processing, pages 124–127. Association for Computational Linguistics, 2018.

 * Mykowiecka, A., M. Marciniak and P. Rychlik (2018) [[https://www.aclweb.org/anthology/L18-1381/|SimLex-999 for Polish]], LREC 2018.

 * Marciniak, M., A. Mykowiecka, and P. Rychlik. Recognition of irrelevant phrases in automatically extracted lists of domain terms. Terminology, 24(1):66–90, 2018.

 
Line 67: Line 87:
 * Mykowiecka, A., M. Marciniak and P. Rychlik (2018) !SimLex-999 for Polish, LREC 2018.
Line 69: Line 88:
 * Wawer, A., M. Marciniak and A. Mykowiecka (2019) Detecting word level metaphors in Polish, Proceedings of the 9th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics (LTC 2019).
Line 71: Line 89:
 * Mykowiecka, A., P. Rychlik, Sz. Rutkowski (to appear) Estimating senses with sets of lexically related words for Polish word sense disambiguation, Proceedings of the 10th Global WordNet Conference (GWC 2019).  * Mykowiecka, A., M. Marciniak and P. Rychlik (2016) [[https://aclweb.org/anthology/W/W16/W16-4703.pdf|Recognition of non-domain phrases in automatically extracted lists of terms]], Proceedings of the 5th International Workshop on Computational Terminology Computerm2016

CoDeS project

Project factsheet

English name:

Compositional distributional semantic models for identification, discrimination and disambiguation of senses in Polish texts

Polish name:

Wykorzystanie metod kompozycyjnej semantyki dystrybucyjnej do identyfikacji i rozróżniania znaczeń w języku polskim

Project type:

A National Science Centre grant 2014/15/B/ST6/05186

Duration:

Aug 2015 ‒ Aug 2018

Project Web page:

http://zil.ipipan.waw.pl/CoDeS

NCN project info:

https://projekty.ncn.gov.pl/index.php?s=4710

Principal investigator:

Agnieszka Mykowiecka

Project summary

The general aim of the project is to evaluate existing methods and devise new techniques of computational distributional semantics (CDS) in the area of discrimination and disambiguation of senses for Polish. One of the basic problems which has to be solved while analyzing natural language utterances is semantic ambiguity of their elements. Our goal is to elaborate methods determining the meaning of a particular word in the context in which it appears and methods for automatic detection if a word is used in more than one sense in an analyzed text. We want to focus our attention on noun-adjectives constructions and investigate how the meaning of a whole phrase may contribute to creation of sense models for its elements. We also plan to extend these methods to recognize words or phrases which are not used in their literal sense but figuratively. One of the project’s goal is to look for answers for some of general questions, for example, to what degree we will be able to describe sense differences using CDS methods, and which types of models are better suited for such a task and Polish data.

Resources

Word embedings

Polish word embeddings http://dsmodels.nlp.ipipan.waw.pl/ (A. Mykowiecka, M. Marciniak, P. Rychlik, 2017)

DSmodels demo - web service for calculating word similarity using Polish word embeddings

SimLex for Polish

Polish version of SimLex-999 (A. Mykowiecka, M. Marciniak, P. Rychlik, 2018).

Plain text (UTF-8 encoded) of Polish SimLex-999.

Literal/Non-Literal Adjective-Noun Phrases

FigAN: a list of 1526 adjective-noun phrases labelled with one of three categories : L (literal phrase), M (non-literal phrase), B (ambiguous phrase)

FigSen-1: 1833 short fragments of text selected from the NKJP (National Corpus of Polish, (Przepiórkowski et al., 2012)) in which all grammatically correct occurrences of all adjective-noun phrases were annotated at the phrase level either as literal (L) or figurative (M).

FigSen-2: Word-level metaphors corpus annotated by Joanna March_la and Maciej Rosiński. The same 1833 fragments with alternative annotation. It contains two versions (FigSen-2_9_11 and FigSen-2_16_09) of word-level annotations. You may likely want to work with the final annotations only (9_11). Metaphors are marked for selected types of part-of-speech tags in a column named "Metafory". Metaphorically used words are marked as "M", literally used words as "L". More details can be found in the (LTC 2019) paper listed below.

Word Sense Disambiguation

Python package for the WSD method presented in (Rutkowski, Sz., P. Rychlik, and A. Mykowiecka, 2019).

Sense-annotated text for WSD testing.

Publications