| Size: 2391 Comment:  | Size: 5418 Comment:  | 
| Deletions are marked like this. | Additions are marked like this. | 
| Line 1: | Line 1: | 
| #acl AgnieszkaMykowiecka:read,write,revert,delete,admin AleksanderWawer:read,write,revert,delete,admin All:read | #acl AgnieszkaMykowiecka:read,write,revert,delete,admin AleksanderWawer:read,write,revert,delete,admin PiotrRychlik:read,write,revert,delete,admin All:read #acl +All:read Default | 
| Line 21: | Line 22: | 
| || Duration:     || Sept 2015 ‒ Sept 2018 || || Project Web page: || http://zil.ipipan.waw.pl/codes || | || Duration:     || Aug 2015 ‒ Aug 2018 || || Project Web page: || http://zil.ipipan.waw.pl/CoDeS || || NCN project info: || https://projekty.ncn.gov.pl/index.php?s=4710 || | 
| Line 27: | Line 29: | 
| The general aim of the project is to evaluate existing methods and devise new techniques of computational distributional semantics (CDS) in the area of discrimination and disambiguation of senses for Polish. One of the basic problems which has to be solved while analyzing natural language utterances is semantic ambiguity of their elements. Our goal is to elaborate methods determining the meaning of a particular word in the context in which it appears and methods for automatic detection if a word is used in more than one sense in an analyzed text. We want to focus our attention on noun-adjectives constructions and investigate how the meaning of a whole phrase may contribute to creation of sense models for its elements. We also plan to extend these method to recognize words or phrases which are not used in their literal sense but figuratively. One of the project’s goal is to look for answers for some of general questions, for example, to what degree we will be able to describe sense differences using CDS methods, and which types of models are better suited for such a task and Polish data. | The general aim of the project is to evaluate existing methods and devise new techniques of computational distributional semantics (CDS) in the area of discrimination and disambiguation of senses for Polish. One of the basic problems which has to be solved while analyzing natural language utterances is semantic ambiguity of their elements. Our goal is to elaborate methods determining the meaning of a particular word in the context in which it appears and methods for automatic detection if a word is used in more than one sense in an analyzed text. We want to focus our attention on noun-adjectives constructions and investigate how the meaning of a whole phrase may contribute to creation of sense models for its elements. We also plan to extend these methods to recognize words or phrases which are not used in their literal sense but figuratively. One of the project’s goal is to look for answers for some of general questions, for example, to what degree we will be able to describe sense differences using CDS methods, and which types of models are better suited for such a task and Polish data. | 
| Line 30: | Line 32: | 
| TBD | Polish word embeddings http://dsmodels.nlp.ipipan.waw.pl/ (A. Mykowiecka, M. Marciniak, P. Rychlik, 2017) [[http://dsmodels.nlp.ipipan.waw.pl/sim1.html|DSmodels demo]] - web service for calculating word similarity using Polish word embeddings Polish version of [[attachment:MSimLex999_Polish.pdf|SimLex-999|&do-get]] (A. Mykowiecka, M. Marciniak, P. Rychlik, 2018). Plain text (UTF-8 encoded) of Polish [[attachment:MSimLex999_Polish.zip|SimLex-999|&do-get]]. Metaphors corpus annotated by Joanna Marchula and Maciej Rosiński. It contains two versions ([[attachment:Korpus_9_11.zip|9_11|&do-get]] and [[attachment:Korpus_16_09|16_09|&do-get]]) of word-level anotations. You may likely want to work with the final annotations only ([[attachment:Korpus_9_11.zip|9_11|&do-get]]). Metaphors are marked for selected types of part-of-speech tags in a column named "Metafory". Metaphorically used words are marked as "M", literally used words as "L". More details can be found in the (LTC 2019) paper listed below. Python [[attachment:gibber-master.zip|package|&do-get]] for the WSD method presented in (A. Mykowiecks, P. Rychlik, Sz. Rutkowski, 2019). Sense-annotated [[attachment:WSD-test-data.csv|text|&do-get]] for WSD testing. == Publications == * Mykowiecka, A., M. Marciniak and P. Rychlik (2016) [[https://aclweb.org/anthology/W/W16/W16-4703.pdf|Recognition of non-domain phrases in automatically extracted lists of terms]], Proceedings of the 5th International Workshop on Computational Terminology Computerm2016 * Wawer, A. and A. Mykowiecka (2017) [[http://www.aclweb.org/anthology/W17-1915|Supervised and Unsupervised Word Sense Disambiguation on Word Embedding Vectors of Unambiguous Synonyms]], Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications, pages 120–125, Valencia, Spain, April 4 2017 * Wawer, A. and A. Mykowiecka (2017) [[http://lml.bas.bg/ranlp2017/RANLP2017_proceedings_draft_6.09.2017.pdf|Detecting Metaphorical Phrases in the Polish Language]], Proc. of the Recent Advances in Natural Language Conference (RANLP), Warna. * Mykowiecka, A., M. Marciniak and P. Rychlik (2017) [[https://ispan.waw.pl/journals/index.php/cs-ec/article/view/cs.1468|Testing word embeddings for Polish]], Cogntive Studies (17), DOI: 10.11649/cs.1468 * Mykowiecka, A., M. Marciniak and P. Rychlik (2018) !SimLex-999 for Polish, LREC 2018. * Wawer, A., M. Marciniak and A. Mykowiecka (2019) Detecting word level metaphors in Polish, Proceedings of the 9th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics (LTC 2019). * Mykowiecka, A., P. Rychlik, Sz. Rutkowski (to appear) Estimating senses with sets of lexically related words for Polish word sense disambiguation, Proceedings of the 10th Global WordNet Conference (GWC 2019). | 
CoDeS project
Project factsheet
| English name: | Compositional distributional semantic models for identification, discrimination and disambiguation of senses in Polish texts | 
| Polish name: | Wykorzystanie metod kompozycyjnej semantyki dystrybucyjnej do identyfikacji i rozróżniania znaczeń w języku polskim | 
| Project type: | A National Science Centre grant 2014/15/B/ST6/05186 | 
| Duration: | Aug 2015 ‒ Aug 2018 | 
| Project Web page: | |
| NCN project info: | |
| Principal investigator: | Agnieszka Mykowiecka | 
Project summary
The general aim of the project is to evaluate existing methods and devise new techniques of computational distributional semantics (CDS) in the area of discrimination and disambiguation of senses for Polish. One of the basic problems which has to be solved while analyzing natural language utterances is semantic ambiguity of their elements. Our goal is to elaborate methods determining the meaning of a particular word in the context in which it appears and methods for automatic detection if a word is used in more than one sense in an analyzed text. We want to focus our attention on noun-adjectives constructions and investigate how the meaning of a whole phrase may contribute to creation of sense models for its elements. We also plan to extend these methods to recognize words or phrases which are not used in their literal sense but figuratively. One of the project’s goal is to look for answers for some of general questions, for example, to what degree we will be able to describe sense differences using CDS methods, and which types of models are better suited for such a task and Polish data.
Resources
Polish word embeddings http://dsmodels.nlp.ipipan.waw.pl/ (A. Mykowiecka, M. Marciniak, P. Rychlik, 2017)
DSmodels demo - web service for calculating word similarity using Polish word embeddings
Polish version of SimLex-999 (A. Mykowiecka, M. Marciniak, P. Rychlik, 2018).
Plain text (UTF-8 encoded) of Polish SimLex-999.
Metaphors corpus annotated by Joanna Marchula and Maciej Rosiński. It contains two versions (9_11 and 16_09) of word-level anotations. You may likely want to work with the final annotations only (9_11). Metaphors are marked for selected types of part-of-speech tags in a column named "Metafory". Metaphorically used words are marked as "M", literally used words as "L". More details can be found in the (LTC 2019) paper listed below.
Python package for the WSD method presented in (A. Mykowiecks, P. Rychlik, Sz. Rutkowski, 2019).
Sense-annotated text for WSD testing.
Publications
- Mykowiecka, A., M. Marciniak and P. Rychlik (2016) Recognition of non-domain phrases in automatically extracted lists of terms, Proceedings of the 5th International Workshop on Computational Terminology Computerm2016 
- Wawer, A. and A. Mykowiecka (2017) Supervised and Unsupervised Word Sense Disambiguation on Word Embedding Vectors of Unambiguous Synonyms, Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications, pages 120–125, Valencia, Spain, April 4 2017 
- Wawer, A. and A. Mykowiecka (2017) Detecting Metaphorical Phrases in the Polish Language, Proc. of the Recent Advances in Natural Language Conference (RANLP), Warna. 
- Mykowiecka, A., M. Marciniak and P. Rychlik (2017) Testing word embeddings for Polish, Cogntive Studies (17), DOI: 10.11649/cs.1468 
- Mykowiecka, A., M. Marciniak and P. Rychlik (2018) SimLex-999 for Polish, LREC 2018. 
- Wawer, A., M. Marciniak and A. Mykowiecka (2019) Detecting word level metaphors in Polish, Proceedings of the 9th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics (LTC 2019).
- Mykowiecka, A., P. Rychlik, Sz. Rutkowski (to appear) Estimating senses with sets of lexically related words for Polish word sense disambiguation, Proceedings of the 10th Global WordNet Conference (GWC 2019). 
