Differences between revisions 1 and 424 (spanning 423 versions)

Natural Language Processing Seminar 2021–2022

The NLP Seminar is organised by the Linguistic Engineering Group at the Institute of Computer Science, Polish Academy of Sciences (ICS PAS). It takes place on (some) Mondays, usually at 10:15 am, currently online – please use the link next to the presentation title. All recorded talks are available on YouTube.

11 October 2021 (NOTE: the seminar will start at 10:00)

Adam Przepiórkowski (Institute of Computer Science, Polish Academy of Sciences / University of Warsaw)

Polyadic Quantifiers in Heterofunctional Coordination

The aim of this talk is to provide a semantic analysis of a construction – Heterofunctional Coordination – which is typical of Slavic and some neighbouring languages. In this construction, expressions bearing different grammatical functions may be conjoined. In this talk, I will propose a semantic analysis of such constructions based on the concept of generalized quantifiers (Mostowski; Lindström; Barwise and Cooper), and more specifically – polyadic quantifiers (van Benthem; Keenan; Westerståhl). Some familiarity with the language of predicate logic should suffice to fully understand the talk; all linguistic concepts (including "coordination", "grammatical functions") and logical concepts (including "generalized quantifiers" and "polyadic quantifiers") will be explained in the talk.

18 October 2021

Jan Kocoń, Przemysław Kazienko (Wrocław University of Technology)

Personalized NLP

Many natural language processing tasks, such as classifying offensive, toxic, or emotional texts, are inherently subjective in nature. This is a major challenge, especially with regard to the annotation process. Humans tend to perceive textual content in their own individual way. Most current annotation procedures aim to achieve a high level of agreement in order to generate a high quality reference source. Existing machine learning methods commonly rely on agreed output values that are the same for all annotators. However, annotation guidelines for subjective content can limit annotators' decision-making freedom. Motivated by moderate annotation agreement on offensive and emotional content datasets, we hypothesize that a personalized approach should be introduced for such subjective tasks. We propose new deep learning architectures that take into account not only the content but also the characteristics of the individual. We propose different approaches for learning the representation and processing of data about text readers. Experiments were conducted on four datasets: Wikipedia discussion texts labeled with attack, aggression, and toxicity, and opinions annotated with ten numerical emotional categories. All of our models based on human biases and their representations significantly improve prediction quality in subjective tasks evaluated from an individual's perspective. Additionally, we have developed requirements for annotation, personalization, and content processing procedures to make our solutions human-centric.

8 November 2021

Ryszard Tuora, Łukasz Kobyliński (Institute of Computer Science, Polish Academy of Sciences)

Dependency Trees in Automatic Inflection of Multi Word Expressions in Polish

Talk summary will be available shortly.

29 November 2021 (NOTE: the seminar will start at 10:00)

Piotr Przybyła (Institute of Computer Science, Polish Academy of Sciences)

When classification accuracy is not enough: Explaining news credibility assessment and measuring users' reaction

Talk summary will be available shortly.

6 December 2021

Joanna Byszuk (Institute of Polish Language, Polish Academy of Sciences)

Towards multimodal stylometry – possibilities and challenges of new approach to film and TV series analysis

This talk will present a proposal of novel approach to quantitative analysis of multimodal works on the example of the corpus of Doctor Who television series, which draws from stylometry and multimodal theory of film analysis. Stylometric methods have long been popular in the analysis of literary texts. They usually include comparision of texts based on the frequencies of use of selected features which create "stylometric fingerprints", i.e. patterns characteristic of authors, genres and other factors. They are, however, rarely applied to data other than text, with a few new approaches applying stylometry to the study of dance movements (works by Miguel Escobar Varela) or music (Backer and Kranenburg). Multimodal theory of film analysis is in turn a relatively new approach (developed primarily by John Bateman and Janina Wildfeuer), emphasizing the importance of examining information from various image, language and sound modalities for a more comprehensive interpretation. The presented approach uses stylometric method of comparison but taking multiple types of features from various film modalities, i.e. features of image and sound as well as the content of the spoken dialogues. The talk will discuss the benefits and challenges of such an approach and quantitative film media analysis in general.

Please see also the talks given in 2000–2015 and 2015–2020.

-  ⇤ ← Revision 1 as of 2016-06-27 22:35:36 → 
  Size: 834
  Editor: MaciejOgrodniczuk
  Comment:
+   ← Revision 424 as of 2021-10-05 14:21:09 → ⇥
  Size: 9397
  Editor: MaciejOgrodniczuk
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 3:
-= Natural Language Processing Seminar 2016–2017 =
+= Natural Language Processing Seminar 2021–2022 =
 Line 5:
-||<style="border:0;padding:0">The NLP Seminar is organised by the [[http://nlp.ipipan.waw.pl/|Linguistic Engineering Group]] at the [[http://www.ipipan.waw.pl/en/|Institute of Computer Science]], [[http://www.pan.pl/index.php?newlang=english|Polish Academy of Sciences]] (ICS PAS). It takes place on (some) Mondays, normally at 10:15 am, in the seminar room of the ICS PAS (ul. Jana Kazimierza 5, Warszawa). ||<style="border:0;padding-left:30px">[[seminarium-archiwum|{{attachment:pl.png}}]]||
+||<style="border:0;padding-bottom:10px">The NLP Seminar is organised by the [[http://nlp.ipipan.waw.pjl/|Linguistic Engineering Group]] at the [[http://www.ipipan.waw.pl/en/|Institute of Computer Science]], [[http://www.pan.pl/index.php?newlang=english|Polish Academy of Sciences]] (ICS PAS). It takes place on (some) Mondays, usually at 10:15 am, currently online – please use the link next to the presentation title. All recorded talks are available on [[https://www.youtube.com/ipipan|YouTube]]. ||<style="border:0;padding-left:30px">[[seminarium|{{attachment:seminar-archive/pl.png}}]]||
 Line 7:
-||<style="border:0;padding-top:10px">Please come back in October! And now see [[http://nlp.ipipan.waw.pl/NLP-SEMINAR/previous-e.html|the talks given between 2000 and 2015]] and [[http://zil.ipipan.waw.pl/seminar|2015-16]].
+||<style="border:0;padding-top:5px;padding-bottom:5px">'''11 October 2021''' (NOTE: the seminar will start at 10:00)||
||<style="border:0;padding-left:30px;padding-bottom:0px">'''Adam Przepiórkowski''' (Institute of Computer Science, Polish Academy of Sciences / University of Warsaw)||
||<style="border:0;padding-left:30px;padding-bottom:5px">[[https://teams.microsoft.com/l/meetup-join/19%3a06de5a6d7ed840f0a53c26bf62c9ec18%40thread.tacv2/1633426362584?context=%7b%22Tid%22%3a%220425f1d9-16b2-41e3-a01a-0c02a63d13d6%22%2c%22Oid%22%3a%2256c98727-58a9-4bc2-a706-2e47ff6ae312%22%7d|{{attachment:seminarium-archiwum/teams.png}}]] '''Polyadic Quantifiers in Heterofunctional Coordination''' &#160;{{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">The aim of this talk is to provide a semantic analysis of a construction – Heterofunctional Coordination – which is typical of Slavic and some neighbouring languages.  In this construction, expressions bearing different grammatical functions may be conjoined.  In this talk, I will propose a semantic analysis of such constructions based on the concept of generalized quantifiers (Mostowski; Lindström; Barwise and Cooper), and more specifically – polyadic quantifiers (van Benthem; Keenan; Westerståhl).  Some familiarity with the language of predicate logic should suffice to fully understand the talk; all linguistic concepts (including "coordination", "grammatical functions") and logical concepts (including "generalized quantifiers" and "polyadic quantifiers") will be explained in the talk.||

||<style="border:0;padding-top:5px;padding-bottom:5px">'''18 October 2021'''||
||<style="border:0;padding-left:30px;padding-bottom:0px">'''Jan Kocoń''', '''Przemysław Kazienko''' (Wrocław University of Technology)||
||<style="border:0;padding-left:30px;padding-bottom:5px">'''Personalized NLP''' &#160;{{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">Many natural language processing tasks, such as classifying offensive, toxic, or emotional texts, are inherently subjective in nature. This is a major challenge, especially with regard to the annotation process. Humans tend to perceive textual content in their own individual way.  Most current annotation procedures aim to achieve a high level of agreement in order to generate a high quality reference source. Existing machine learning methods commonly rely on agreed output values that are the same for all annotators. However, annotation guidelines for subjective content can limit annotators' decision-making freedom. Motivated by moderate annotation agreement on offensive and emotional content datasets, we hypothesize that a personalized approach should be introduced for such subjective tasks. We propose new deep learning architectures that take into account not only the content but also the characteristics of the individual. We propose different approaches for learning the representation and processing of data about text readers. Experiments were conducted on four datasets: Wikipedia discussion texts labeled with attack, aggression, and toxicity, and opinions annotated with ten numerical emotional categories. All of our models based on human biases and their representations significantly improve prediction quality in subjective tasks evaluated from an individual's perspective. Additionally, we have developed requirements for annotation, personalization, and content processing procedures to make our solutions human-centric.||

||<style="border:0;padding-top:5px;padding-bottom:5px">'''8 November 2021'''||
||<style="border:0;padding-left:30px;padding-bottom:0px">'''Ryszard Tuora''', '''Łukasz Kobyliński''' (Institute of Computer Science, Polish Academy of Sciences)||
||<style="border:0;padding-left:30px;padding-bottom:5px">'''Dependency Trees in Automatic Inflection of Multi Word Expressions in Polish''' &#160;{{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">Talk summary will be available shortly.||

||<style="border:0;padding-top:5px;padding-bottom:5px">'''29 November 2021''' (NOTE: the seminar will start at 10:00)||
||<style="border:0;padding-left:30px;padding-bottom:0px">'''Piotr Przybyła''' (Institute of Computer Science, Polish Academy of Sciences)||
||<style="border:0;padding-left:30px;padding-bottom:5px">'''When classification accuracy is not enough: Explaining news credibility assessment and measuring users' reaction''' &#160;{{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">Talk summary will be available shortly.||

||<style="border:0;padding-top:5px;padding-bottom:5px">'''6 December 2021'''||
||<style="border:0;padding-left:30px;padding-bottom:0px">'''Joanna Byszuk''' (Institute of Polish Language, Polish Academy of Sciences)||
||<style="border:0;padding-left:30px;padding-bottom:5px">'''Towards multimodal stylometry – possibilities and challenges of new approach to film and TV series analysis''' &#160;{{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">This talk will present a proposal of novel approach to quantitative analysis of multimodal works on the example of the corpus of Doctor Who television series, which draws from stylometry and multimodal theory of film analysis. Stylometric methods have long been popular in the analysis of literary texts. They usually include comparision of texts based on the frequencies of use of selected features which create "stylometric fingerprints", i.e. patterns characteristic of authors, genres and other factors. They are, however, rarely applied to data other than text, with a few new approaches applying stylometry to the study of dance movements (works by Miguel Escobar Varela) or music (Backer and Kranenburg). Multimodal theory of film analysis is in turn a relatively new approach (developed primarily by John Bateman and Janina Wildfeuer), emphasizing the importance of examining information from various image, language and sound modalities for a more comprehensive interpretation. The presented approach uses stylometric method of comparison but taking multiple types of features from various film modalities, i.e. features of image and sound as well as the content of the spoken dialogues. The talk will discuss the benefits and challenges of such an approach and quantitative film media analysis in general.||


||<style="border:0;padding-top:10px">Please see also [[http://nlp.ipipan.waw.pl/NLP-SEMINAR/previous-e.html|the talks given in 2000–2015]] and [[http://zil.ipipan.waw.pl/seminar-archive|2015–2020]].||

{{{#!wiki comment

||<style="border:0;padding-top:5px;padding-bottom:5px">'''2 April 2020'''||
||<style="border:0;padding-left:30px;padding-bottom:0px">'''Stan Matwin''' (Dalhousie University)||
||<style="border:0;padding-left:30px;padding-bottom:5px">'''Efficient training of word embeddings with a focus on negative examples''' &#160;{{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}} {{attachment:seminarium-archiwum/icon-en.gif|Slides in English.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">This presentation is based on our [[https://pdfs.semanticscholar.org/1f50/db5786913b43f9668f997fc4c97d9cd18730.pdf|AAAI 2018]] and [[https://aaai.org/ojs/index.php/AAAI/article/view/4683|AAAI 2019]] papers on English word embeddings. In particular, we examine the notion of “negative examples”, the unobserved or insignificant word-context co-occurrences, in spectral methods. we provide a new formulation for the word embedding problem by proposing a new intuitive objective function that perfectly justifies the use of negative examples. With the goal of efficient learning of embeddings, we propose a kernel similarity measure for the latent space that can effectively calculate the similarities in high dimensions. Moreover, we propose an approximate alternative to our algorithm using a modified Vantage Point tree and reduce the computational complexity of the algorithm with respect to the number of words in the vocabulary. We have trained various word embedding algorithms on articles of Wikipedia with 2.3 billion tokens and show that our method outperforms the state-of-the-art in most word similarity tasks by a good margin. We will round up our discussion with some general thought s about the use of embeddings in modern NLP.||
}}}

Diff for "seminar"

Menu

Natural Language Processing Seminar 2021–2022