Size: 832
Comment:
|
Size: 8224
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 3: | Line 3: |
= Natural Language Processing Seminar 2016–2017 = | = Natural Language Processing Seminar 2025–2026 = |
Line 5: | Line 5: |
||<style="border:0;padding:0">The NLP Seminar is organised by the [[http://nlp.ipipan.waw.pl/|Linguistic Engineering Group]] at the [[http://www.ipipan.waw.pl/en/|Institute of Computer Science]], [[http://www.pan.pl/index.php?newlang=english|Polish Academy of Sciences]] (ICS PAS). It takes place on (some) Mondays, normally at 10:15 am, in the seminar room of the ICS PAS (ul. Jana Kazimierza 5, Warszawa). ||<style="border:0;padding-left:30px">[[seminarium|{{attachment:seminar-archive/pl.png}}]]|| | ||<style="border:0;padding-bottom:10px">The NLP Seminar is organised by the [[http://nlp.ipipan.waw.pjl/|Linguistic Engineering Group]] at the [[http://www.ipipan.waw.pl/en/|Institute of Computer Science]], [[http://www.pan.pl/index.php?newlang=english|Polish Academy of Sciences]] (ICS PAS). It will restart in October and will take place on (some) Mondays, usually at 10:15 am, often online – please use the link next to the presentation title. All recorded talks are available on [[https://www.youtube.com/ipipan|YouTube]]. ||<style="border:0;padding-left:30px">[[seminarium|{{attachment:seminar-archive/pl.png}}]]|| |
Line 7: | Line 7: |
It's summer holiday season, please come back in October! And now see [[http://nlp.ipipan.waw.pl/NLP-SEMINAR/previous-e.html|the talks given between 2000 and 2015]] and [[http://zil.ipipan.waw.pl/seminar|2015-16]]. | ||<style="border:0;padding-top:5px;padding-bottom:5px">'''15 September 2025'''|| ||<style="border:0;padding-left:30px;padding-bottom:0px">'''Louis Esteve''' (Universite Paris-Saclay) || ||<style="border:0;padding-left:30px;padding-bottom:5px">'''[[attachment:seminarium-archiwum/2025-09-15.pdf|Diversity and dataset size – a quantitative perspective]]'''  {{attachment:seminarium-archiwum/icon-en.gif|Talk in English.}}|| ||<style="border:0;padding-left:30px;padding-bottom:15px">The field of Natural Language Processing (NLP) studies the abilities of computer systems to process and generate natural language, and has received increasing attention from the general population since the democratisation of generative and conversational models. However, behind the scenes, state-of-the-art NLP models are trained on ever-larger datasets, reaching trillions of tokens. It may be argued that the creation and use of such immense datasets is motivated by the idea that 'the larger the dataset, the more diverse it is', and that in turn 'if the training set is more diverse, it shall yield better models'. However, these statements thus far remain intuitions and need to be properly tested. To this end, this presentation will tackle methods and caveats of formal diversity quantification including limitations of the literature, a preliminary discussion on the link between diversity and dataset size, as well as their impact on downstream applications.|| ||<style="border:0;padding-top:5px;padding-bottom:5px">'''6 October 2025'''|| ||<style="border:0;padding-left:30px;padding-bottom:0px">'''Stan Matwin''' (Dalhousie University / Institute of Computer Science, Polish Academy of Sciences) || ||<style="border:0;padding-left:30px;padding-bottom:5px">[[https://www.youtube.com/watch?v=hwBs4D7clls|{{attachment:seminarium-archiwum/youtube.png}}]] '''[[attachment:seminarium-archiwum/2025-10-06.pdf|Deep, multi-faceted learning of diagnosing mental disorders from clinical interview records]]'''  {{attachment:seminarium-archiwum/icon-pl.gif|Talk in Polish.}} {{attachment:seminarium-archiwum/icon-en.gif|Slides partially in English.}}|| ||<style="border:0;padding-left:30px;padding-bottom:15px">The key characteristics of mental illnesses are reflected in audio recordings of clinical interviews with patients and their families. We have developed a deep learning method that automatically extracts the relevant features necessary for the diagnosis of mental illnesses (ADHD, depression, bipolar disorder and schizophrenia) from such interviews. We use a variety of pre-trained models to extract representations from both the audio segments of these interviews and their text versions. We use several modern representation techniques (embeddings). We apply a Big Data approach by exploring existing audio and text corpora annotated with emotional labels. We address the problem of annotated data scarcity by using parametric model fine-tuning (Parameter Efficient Fine-Tuning). All these representations are then combined into a single multimodal form. To diagnose the above mental disorders, we use contrastive learning and model synthesis using a committee of experts (Mixture of Experts). The results show that through multimodal analysis of clinical interviews, mental disorders can be diagnosed with satisfactory accuracy (project conducted in collaboration with H. Naderi and R. Uher).|| ||<style="border:0;padding-top:5px;padding-bottom:5px">'''20 October 2025'''|| ||<style="border:0;padding-left:30px;padding-bottom:0px">'''Arkadiusz Modzelewski''' (University of Padua / Polish-Japanese Academy of Information Technology)|| ||<style="border:0;padding-left:30px;padding-bottom:5px">[[http://zil.ipipan.waw.pl/seminarium-online|{{attachment:seminarium-archiwum/teams.png}}]] '''The Why and How of Disinformation: Datasets, Methods and Language Models Evaluation'''  {{attachment:seminarium-archiwum/icon-en.gif|Talk in English.}}|| ||<style="border:0;padding-left:30px;padding-bottom:15px">What language tools do disinformation agents employ? Can incorporating persuasion and intent knowledge enhance the ability of large language models to detect disinformation? And how effective are LLMs at identifying disinformation in Polish and English? In this talk, I will present findings from my PhD research on disinformation, persuasion, and the intent behind misleading information. I will introduce one of the largest Polish disinformation datasets, alongside a novel English dataset, both designed to capture manipulative techniques and intent of disinformation agents. Drawing on these and other resources, I will discuss how well current LLMs perform in detecting disinformation, persuasion, and intent, and highlight promising directions for improving their effectiveness in disinformation detection.|| ||<style="border:0;padding-top:5px;padding-bottom:5px">'''3 November 2025'''|| ||<style="border:0;padding-left:30px;padding-bottom:0px">'''Gražina Korvel''' (Vilnius University) || ||<style="border:0;padding-left:30px;padding-bottom:5px">[[http://zil.ipipan.waw.pl/seminarium-online|{{attachment:seminarium-archiwum/teams.png}}]] '''Talk title will be given soon'''  {{attachment:seminarium-archiwum/icon-pl.gif|Talk in Polish.}}|| ||<style="border:0;padding-left:30px;padding-bottom:15px">Talk summary wiil be made available shortly.|| ||<style="border:0;padding-top:5px;padding-bottom:5px">'''24 Novembe 2025'''|| ||<style="border:0;padding-left:30px;padding-bottom:0px">'''Jan Eliasz''', '''Mikołaj Langner''', '''Jan Kocoń''' (Wrocław University of Science and Technology) || ||<style="border:0;padding-left:30px;padding-bottom:5px">[[http://zil.ipipan.waw.pl/seminarium-online|{{attachment:seminarium-archiwum/teams.png}}]] '''Talk title will be given soon'''  {{attachment:seminarium-archiwum/icon-pl.gif|Talk in Polish.}}|| ||<style="border:0;padding-left:30px;padding-bottom:15px">Talk summary wiil be made available shortly.|| ||<style="border:0;padding-top:5px;padding-bottom:5px">'''8 December 2025'''|| ||<style="border:0;padding-left:30px;padding-bottom:0px">'''Filip Kucia''', '''Anna Wróblewska''', '''• Bartosz Grabek''', '''Szymon Trochimiak''' (Warsaw University of Technology) || ||<style="border:0;padding-left:30px;padding-bottom:5px">[[http://zil.ipipan.waw.pl/seminarium-online|{{attachment:seminarium-archiwum/teams.png}}]] '''How to Make Museums More Interactive? Case Study of the “Artistic Chatbot”'''  {{attachment:seminarium-archiwum/icon-pl.gif|Talk in Polish.}}|| ||<style="border:0;padding-left:30px;padding-bottom:15px">Talk summary wiil be made available shortly.|| ||<style="border:0;padding-top:10px">Please see also [[http://nlp.ipipan.waw.pl/NLP-SEMINAR/previous-e.html|the talks given in 2000–2015]] and [[http://zil.ipipan.waw.pl/seminar-archive|2015–2025]].|| {{{#!wiki comment ||<style="border:0;padding-top:5px;padding-bottom:5px">'''11 March 2024'''|| ||<style="border:0;padding-left:30px;padding-bottom:0px">'''Mateusz Krubiński''' (Charles University in Prague)|| ||<style="border:0;padding-left:30px;padding-bottom:5px">[[http://zil.ipipan.waw.pl/seminarium-online|{{attachment:seminarium-archiwum/teams.png}}]] '''Talk title will be given shortly'''  {{attachment:seminarium-archiwum/icon-en.gif|Talk in Polish.}}|| ||<style="border:0;padding-left:30px;padding-bottom:15px">Talk summary will be made available soon.|| }}} |
Natural Language Processing Seminar 2025–2026
The NLP Seminar is organised by the Linguistic Engineering Group at the Institute of Computer Science, Polish Academy of Sciences (ICS PAS). It will restart in October and will take place on (some) Mondays, usually at 10:15 am, often online – please use the link next to the presentation title. All recorded talks are available on YouTube. |
15 September 2025 |
Louis Esteve (Universite Paris-Saclay) |
The field of Natural Language Processing (NLP) studies the abilities of computer systems to process and generate natural language, and has received increasing attention from the general population since the democratisation of generative and conversational models. However, behind the scenes, state-of-the-art NLP models are trained on ever-larger datasets, reaching trillions of tokens. It may be argued that the creation and use of such immense datasets is motivated by the idea that 'the larger the dataset, the more diverse it is', and that in turn 'if the training set is more diverse, it shall yield better models'. However, these statements thus far remain intuitions and need to be properly tested. To this end, this presentation will tackle methods and caveats of formal diversity quantification including limitations of the literature, a preliminary discussion on the link between diversity and dataset size, as well as their impact on downstream applications. |
6 October 2025 |
Stan Matwin (Dalhousie University / Institute of Computer Science, Polish Academy of Sciences) |
|
The key characteristics of mental illnesses are reflected in audio recordings of clinical interviews with patients and their families. We have developed a deep learning method that automatically extracts the relevant features necessary for the diagnosis of mental illnesses (ADHD, depression, bipolar disorder and schizophrenia) from such interviews. We use a variety of pre-trained models to extract representations from both the audio segments of these interviews and their text versions. We use several modern representation techniques (embeddings). We apply a Big Data approach by exploring existing audio and text corpora annotated with emotional labels. We address the problem of annotated data scarcity by using parametric model fine-tuning (Parameter Efficient Fine-Tuning). All these representations are then combined into a single multimodal form. To diagnose the above mental disorders, we use contrastive learning and model synthesis using a committee of experts (Mixture of Experts). The results show that through multimodal analysis of clinical interviews, mental disorders can be diagnosed with satisfactory accuracy (project conducted in collaboration with H. Naderi and R. Uher). |
3 November 2025 |
Gražina Korvel (Vilnius University) |
Talk summary wiil be made available shortly. |
24 Novembe 2025 |
Jan Eliasz, Mikołaj Langner, Jan Kocoń (Wrocław University of Science and Technology) |
Talk summary wiil be made available shortly. |
Please see also the talks given in 2000–2015 and 2015–2025. |