|
Size: 3099
Comment:
|
Size: 4637
Comment:
|
| Deletions are marked like this. | Additions are marked like this. |
| Line 3: | Line 3: |
| = Natural Language Processing Seminar 2019–2020 = | = Natural Language Processing Seminar 2020–2021 = |
| Line 7: | Line 7: |
| ||<style="border:0;padding-top:5px;padding-bottom:5px">'''23 September 2019'''|| ||<style="border:0;padding-left:30px;padding-bottom:0px">'''Igor Boguslavsky''' (Institute for Information Transmission Problems, Russian Academy of Sciences / Universidad Politécnica de Madrid)|| ||<style="border:0;padding-left:30px;padding-bottom:5px">'''The title of the talk will be available shortly'''  {{attachment:seminarium-archiwum/icon-en.gif|Talk delivered in English.}}|| ||<style="border:0;padding-left:30px;padding-bottom:15px">The summary of the talk will be available shortly.|| |
||<style="border:0;padding-bottom:10px">'''NOTE''': Due to restriction of admission to the Institute building, only staff and speakers (including external ones) may currently take part in the seminar. For all other participants the seminar will be broadcast on [[https://www.youtube.com/channel/UC5PEPpMqjAr7Pgdvq0wRn0w||YouTube]].||. |
| Line 12: | Line 9: |
| ||<style="border:0;padding-top:5px;padding-bottom:5px">'''7 October 2019''' (NOTE: the seminar will start at 13:00!)|||| ||<style="border:0;padding-left:30px;padding-bottom:0px">'''Tomasz Stanisz''' (Institute of Nuclear Physics, Polish Academy of Sciences)|| ||<style="border:0;padding-left:30px;padding-bottom:5px">'''The title of the talk will be available shortly'''  {{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}|| ||<style="border:0;padding-left:30px;padding-bottom:15px">The summary of the talk will be available shortly.|| |
||<style="border:0;padding-top:5px;padding-bottom:5px">'''5 October 2020'''|| ||<style="border:0;padding-left:30px;padding-bottom:0px">'''Piotr Rybak''', '''Robert Mroczkowski''', '''Janusz Tracz''' (ML Research at Allegro.pl), '''Ireneusz Gawlik''' (ML Research at Allegro.pl & AGH University of Science and Technology)|| ||<style="border:0;padding-left:30px;padding-bottom:5px">'''Review of BERT-based Models for Polish Language'''  {{attachment:seminarium-archiwum/icon-pl.gif|Delivered in Polish.}}|| ||<style="border:0;padding-left:30px;padding-bottom:15px">In recent years, a series of BERT-based models improved the performance of many natural language processing systems. During this talk, we will briefly introduce the BERT model as well as some of its variants. Next, we will focus on the available BERT-based models for Polish language and their results on the KLEJ benchmark. Finally, we will dive into the details of the new model developed in cooperation between ICS PAS and Allegro.|| |
| Line 17: | Line 14: |
| ||<style="border:0;padding-top:5px;padding-bottom:5px">'''18 November 2019'''|| ||<style="border:0;padding-left:30px;padding-bottom:0px">'''Alexander Rosen''' (Uniwersytet Karola w Pradze)|| ||<style="border:0;padding-left:30px;padding-bottom:5px">'''The title of the talk will be available shortly'''  {{attachment:seminarium-archiwum/icon-en.gif|Talk delivered in English.}}|| ||<style="border:0;padding-left:30px;padding-bottom:15px">The summary of the talk will be available shortly.|| |
||<style="border:0;padding-top:5px;padding-bottom:5px">'''19 October 2020'''|| ||<style="border:0;padding-left:30px;padding-bottom:0px">'''Inez Okulska''' (NASK National Research Institute)|| ||<style="border:0;padding-left:30px;padding-bottom:5px">'''Concise, robust, sparse? Algebraic transformations of word2vec embeddings versus precision of classification'''  {{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}|| ||<style="border:0;padding-left:30px;padding-bottom:15px">The talk summary will be available shortly.|| |
| Line 22: | Line 19: |
| ||<style="border:0;padding-top:5px;padding-bottom:5px">'''21 November 2019'''|| ||<style="border:0;padding-left:30px;padding-bottom:0px">'''Alexander Rosen''' (Uniwersytet Karola w Pradze)|| ||<style="border:0;padding-left:30px;padding-bottom:5px">'''The title of the talk will be available shortly'''  {{attachment:seminarium-archiwum/icon-en.gif|Talk delivered in English.}}|| ||<style="border:0;padding-left:30px;padding-bottom:15px">The summary of the talk will be available shortly.|| |
{{{#!wiki comment |
| Line 27: | Line 21: |
| ||<style="border:0;padding-top:10px">Please see also [[http://nlp.ipipan.waw.pl/NLP-SEMINAR/previous-e.html|the talks given in 2000–2015]] and [[http://zil.ipipan.waw.pl/seminar-archive|2015–2019]].|| | ||<style="border:0;padding-top:5px;padding-bottom:5px">'''2 April 2020'''|| ||<style="border:0;padding-left:30px;padding-bottom:0px">'''Stan Matwin''' (Dalhousie University)|| ||<style="border:0;padding-left:30px;padding-bottom:5px">'''Efficient training of word embeddings with a focus on negative examples'''  {{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}} {{attachment:seminarium-archiwum/icon-en.gif|Slides in English.}}|| ||<style="border:0;padding-left:30px;padding-bottom:15px">This presentation is based on our [[https://pdfs.semanticscholar.org/1f50/db5786913b43f9668f997fc4c97d9cd18730.pdf|AAAI 2018]] and [[https://aaai.org/ojs/index.php/AAAI/article/view/4683|AAAI 2019]] papers on English word embeddings. In particular, we examine the notion of “negative examples”, the unobserved or insignificant word-context co-occurrences, in spectral methods. we provide a new formulation for the word embedding problem by proposing a new intuitive objective function that perfectly justifies the use of negative examples. With the goal of efficient learning of embeddings, we propose a kernel similarity measure for the latent space that can effectively calculate the similarities in high dimensions. Moreover, we propose an approximate alternative to our algorithm using a modified Vantage Point tree and reduce the computational complexity of the algorithm with respect to the number of words in the vocabulary. We have trained various word embedding algorithms on articles of Wikipedia with 2.3 billion tokens and show that our method outperforms the state-of-the-art in most word similarity tasks by a good margin. We will round up our discussion with some general thought s about the use of embeddings in modern NLP.|| }}} ||<style="border:0;padding-top:10px">Please see also [[http://nlp.ipipan.waw.pl/NLP-SEMINAR/previous-e.html|the talks given in 2000–2015]] and [[http://zil.ipipan.waw.pl/seminar-archive|2015–2020]].|| |
Natural Language Processing Seminar 2020–2021
The NLP Seminar is organised by the Linguistic Engineering Group at the Institute of Computer Science, Polish Academy of Sciences (ICS PAS). It takes place on (some) Mondays, normally at 10:15 am, in the seminar room of the ICS PAS (ul. Jana Kazimierza 5, Warszawa). All recorded talks are available on YouTube. |
||<style="border:0;padding-bottom:10px">NOTE: Due to restriction of admission to the Institute building, only staff and speakers (including external ones) may currently take part in the seminar. For all other participants the seminar will be broadcast on https://www.youtube.com/channel/UC5PEPpMqjAr7Pgdvq0wRn0w.||.
5 October 2020 |
Piotr Rybak, Robert Mroczkowski, Janusz Tracz (ML Research at Allegro.pl), Ireneusz Gawlik (ML Research at Allegro.pl & AGH University of Science and Technology) |
Review of BERT-based Models for Polish Language |
In recent years, a series of BERT-based models improved the performance of many natural language processing systems. During this talk, we will briefly introduce the BERT model as well as some of its variants. Next, we will focus on the available BERT-based models for Polish language and their results on the KLEJ benchmark. Finally, we will dive into the details of the new model developed in cooperation between ICS PAS and Allegro. |
19 October 2020 |
Inez Okulska (NASK National Research Institute) |
Concise, robust, sparse? Algebraic transformations of word2vec embeddings versus precision of classification |
The talk summary will be available shortly. |
Please see also the talks given in 2000–2015 and 2015–2020. |

