Differences between revisions 341 and 342

Natural Language Processing Seminar 2020–2021

The NLP Seminar is organised by the Linguistic Engineering Group at the Institute of Computer Science, Polish Academy of Sciences (ICS PAS). It takes place on (some) Mondays, normally at 10:15 am, in the seminar room of the ICS PAS (ul. Jana Kazimierza 5, Warszawa). All recorded talks are available on YouTube.

NOTE: Due to restriction of admission to the Institute building, only staff and speakers (including external ones) may currently take part in the seminar on site. For all other participants the seminar will be broadcast – please use the link next to the presentation title.

5 October 2020

Piotr Rybak, Robert Mroczkowski, Janusz Tracz (ML Research at Allegro.pl), Ireneusz Gawlik (ML Research at Allegro.pl & AGH University of Science and Technology)

Review of BERT-based Models for Polish Language

In recent years, a series of BERT-based models improved the performance of many natural language processing systems. During this talk, we will briefly introduce the BERT model as well as some of its variants. Next, we will focus on the available BERT-based models for Polish language and their results on the KLEJ benchmark. Finally, we will dive into the details of the new model developed in cooperation between ICS PAS and Allegro.

2 November 2020

Inez Okulska (NASK National Research Institute)

Concise, robust, sparse? Algebraic transformations of word2vec embeddings versus precision of classification

The introduction of the vector representation of words, containing the weights of context and central words, calculated as a result of mapping giant corpora of a given language, and not encoding manually selected, linguistic features of words, proved to be a breakthrough for NLP research. After the first delight, there came revision and search for improvements - primarily in order to broaden the context, to handle homonyms, etc. Nevertheless, the classic embeddinga still apply to many tasks - for example, content classification - and in many cases their performance is still good enough. What do they code? Do they contain redundant elements? If transformed or reduced, will they maintain the information in a way that still preserves the original "meaning"? What is the meaning here? How far can these vectors be deformed and how does it relate to encryption methods? In my speech I will present a reflection on this subject, illustrated by the results of various "tortures” of the embeddings (word2vec and glove) and their precision in the task of classifying texts whose content must remain masked for human users.

14 December 2020

Piotr Przybyła (Linguistic Engineering Group, Institute of Computer Science, Polish Academy of Sciences)

Multi-Word Lexical Simplification

The presentation will cover the task of multi-word lexical simplification, in which a sentence in natural language is made easier to understand by replacing its fragment with a simpler alternative, both of which can consist of many words. In order to explore this new direction, a corpus (MWLS1) including 1462 sentences in English from various sources with 7059 simplifications was prepared through crowdsourcing. Additionally, an automatic solution (Plainifier) for the problem, based on a purpose-trained neural language model, will be discussed along with the evaluation, comparing to human and resource-based baselines. The results of the presented study were also published at the COLING 2020 conference in an article of the same title.

Please see also the talks given in 2000–2015 and 2015–2020.

-  ⇤ ← Revision 341 as of 2020-10-31 18:07:52 → 
  Size: 6491
  Editor: MaciejOgrodniczuk
  Comment:
+   ← Revision 342 as of 2020-11-02 10:36:30 → ⇥
  Size: 7254
  Editor: MaciejOgrodniczuk
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 22:
-||<style="border:0;padding-left:30px;padding-bottom:15px">Talk summary will be made available shortly.||
+||<style="border:0;padding-left:30px;padding-bottom:15px">The presentation will cover the task of multi-word lexical simplification, in which a sentence in natural language is made easier to understand by replacing its fragment with a simpler alternative, both of which can consist of many words. In order to explore this new direction, a corpus (MWLS1) including 1462 sentences in English from various sources with 7059 simplifications was prepared through crowdsourcing. Additionally, an automatic solution (Plainifier) for the problem, based on a purpose-trained neural language model, will be discussed along with the evaluation, comparing to human and resource-based baselines. The results of the presented study were also published at the COLING 2020 conference in [[https://coling2020.org/pages/accepted_papers_main_conference|an article of the same title]].||

Diff for "seminar"

Menu

Natural Language Processing Seminar 2020–2021