Natural Language Processing Seminar 2021–2022

The NLP Seminar is organised by the Linguistic Engineering Group at the Institute of Computer Science, Polish Academy of Sciences (ICS PAS). It takes place on (some) Mondays, usually at 10:15 am, currently online – please use the link next to the presentation title. All recorded talks are available on YouTube.

11 October 2021

Adam Przepiórkowski (Institute of Computer Science, Polish Academy of Sciences / University of Warsaw)

Polyadic Quantifiers in Heterofunctional Coordination

The aim of this talk is to provide a semantic analysis of a construction – Heterofunctional Coordination – which is typical of Slavic and some neighbouring languages. In this construction, expressions bearing different grammatical functions may be conjoined. In this talk, I will propose a semantic analysis of such constructions based on the concept of generalized quantifiers (Mostowski; Lindström; Barwise and Cooper), and more specifically – polyadic quantifiers (van Benthem; Keenan; Westerståhl). Some familiarity with the language of predicate logic should suffice to fully understand the talk; all linguistic concepts (including "coordination", "grammatical functions") and logical concepts (including "generalized quantifiers" and "polyadic quantifiers") will be explained in the talk.

18 October 2021

Przemysław Kazienko, Jan Kocoń (Wrocław University of Technology)

Personalized NLP

Many natural language processing tasks, such as classifying offensive, toxic, or emotional texts, are inherently subjective in nature. This is a major challenge, especially with regard to the annotation process. Humans tend to perceive textual content in their own individual way. Most current annotation procedures aim to achieve a high level of agreement in order to generate a high quality reference source. Existing machine learning methods commonly rely on agreed output values that are the same for all annotators. However, annotation guidelines for subjective content can limit annotators' decision-making freedom. Motivated by moderate annotation agreement on offensive and emotional content datasets, we hypothesize that a personalized approach should be introduced for such subjective tasks. We propose new deep learning architectures that take into account not only the content but also the characteristics of the individual. We propose different approaches for learning the representation and processing of data about text readers. Experiments were conducted on four datasets: Wikipedia discussion texts labeled with attack, aggression, and toxicity, and opinions annotated with ten numerical emotional categories. All of our models based on human biases and their representations significantly improve prediction quality in subjective tasks evaluated from an individual's perspective. Additionally, we have developed requirements for annotation, personalization, and content processing procedures to make our solutions human-centric.

8 November 2021

Ryszard Tuora, Łukasz Kobyliński (Institute of Computer Science, Polish Academy of Sciences)

Dependency Trees in Automatic Inflection of Multi Word Expressions in Polish

Natural language generation for morphologically rich languages can benefit from automatic inflection systems. This work presents such a system, which can tackle inflection, with particular emphasis on Multi Word Expressions (MWEs). This is done using rules induced automatically from a dependency treebank. The system is evaluated on a dictionary of Polish MWEs. Additionally, a similar algorithm can be utilized for lemmatization of MWEs. In principle, the system can also be applied to other languages with similar morphological mechanisms. To prove that, we will present a simple solution for Russian.

29 November 2021 (NOTE: the seminar will start at 10:00)

Piotr Przybyła (Institute of Computer Science, Polish Academy of Sciences)

When classification accuracy is not enough: Explaining news credibility assessment and measuring users' reaction

Automatic assessment of text credibility has recently become a very popular task in NLP, with many solutions proposed and evaluated through accuracy-based measures. However, little attention has been given to the deployment scenarios for such models that would reduce the spread of misinformation, as intended. Within the study presented here, two credibility assessment techniques were implemented in a browser extension, which was then used in a user study, allowing to answer questions in three areas. Firstly, how resource-intensive NLP models can be compressed to work in a constrained environment? Secondly, what interpretability and visualisation techniques are most effective in human-computer cooperation? Thirdly, are user relying on such automated tools really more effective in spotting fake news?

6 December 2021

Joanna Byszuk (Institute of Polish Language, Polish Academy of Sciences)

Towards multimodal stylometry – possibilities and challenges of new approach to film and TV series analysis

This talk will present a proposal of novel approach to quantitative analysis of multimodal works on the example of the corpus of Doctor Who television series, which draws from stylometry and multimodal theory of film analysis. Stylometric methods have long been popular in the analysis of literary texts. They usually include comparision of texts based on the frequencies of use of selected features which create "stylometric fingerprints", i.e. patterns characteristic of authors, genres and other factors. They are, however, rarely applied to data other than text, with a few new approaches applying stylometry to the study of dance movements (works by Miguel Escobar Varela) or music (Backer and Kranenburg). Multimodal theory of film analysis is in turn a relatively new approach (developed primarily by John Bateman and Janina Wildfeuer), emphasizing the importance of examining information from various image, language and sound modalities for a more comprehensive interpretation. The presented approach uses stylometric method of comparison but taking multiple types of features from various film modalities, i.e. features of image and sound as well as the content of the spoken dialogues. The talk will discuss the benefits and challenges of such an approach and quantitative film media analysis in general.

Please see also the talks given in 2000–2015 and 2015–2020.

seminar

Menu

Natural Language Processing Seminar 2021–2022