|
Size: 7066
Comment:
|
Size: 4809
Comment:
|
| Deletions are marked like this. | Additions are marked like this. |
| Line 3: | Line 3: |
| = Natural Language Processing Seminar 2021–2022 = | = Natural Language Processing Seminar 2025–2026 = |
| Line 5: | Line 5: |
| ||<style="border:0;padding-bottom:10px">The NLP Seminar is organised by the [[http://nlp.ipipan.waw.pjl/|Linguistic Engineering Group]] at the [[http://www.ipipan.waw.pl/en/|Institute of Computer Science]], [[http://www.pan.pl/index.php?newlang=english|Polish Academy of Sciences]] (ICS PAS). It takes place on (some) Mondays, usually at 10:15 am, currently online – please use the link next to the presentation title. All recorded talks are available on [[https://www.youtube.com/ipipan|YouTube]]. ||<style="border:0;padding-left:30px">[[seminarium|{{attachment:seminar-archive/pl.png}}]]|| | ||<style="border:0;padding-bottom:10px">The NLP Seminar is organised by the [[http://nlp.ipipan.waw.pjl/|Linguistic Engineering Group]] at the [[http://www.ipipan.waw.pl/en/|Institute of Computer Science]], [[http://www.pan.pl/index.php?newlang=english|Polish Academy of Sciences]] (ICS PAS). It will restart in October and will take place on (some) Mondays, usually at 10:15 am, often online – please use the link next to the presentation title. All recorded talks are available on [[https://www.youtube.com/ipipan|YouTube]]. ||<style="border:0;padding-left:30px">[[seminarium|{{attachment:seminar-archive/pl.png}}]]|| |
| Line 7: | Line 7: |
| ||<style="border:0;padding-top:5px;padding-bottom:5px">'''11 October 2021''' (NOTE: the seminar will start at 10:00)|| ||<style="border:0;padding-left:30px;padding-bottom:0px">'''Adam Przepiórkowski''' (Institute of Computer Science, Polish Academy of Sciences / University of Warsaw)|| ||<style="border:0;padding-left:30px;padding-bottom:5px">'''Jakie kwantyfikatory można wyrazić w języku naturalnym? O koordynacji i kwantyfikatorach poliadycznych'''  {{attachment:seminarium-archiwum/icon-pl.gif|Wystąpienie w języku polskim.}}|| ||<style="border:0;padding-left:30px;padding-bottom:15px">Streszczenie wystąpienia podamy już niedługo.|| |
||<style="border:0;padding-top:5px;padding-bottom:5px">'''15 September 2025'''|| ||<style="border:0;padding-left:30px;padding-bottom:0px">'''Louis Esteve''' (Universite Paris-Saclay) || ||<style="border:0;padding-left:30px;padding-bottom:5px">[[https://www.youtube.com/watch?v=2mLYixXC_Hw|{{attachment:seminarium-archiwum/teams.png}}]] '''Diversity and dataset size – a quantitative perspective]]'''  {{attachment:seminarium-archiwum/icon-en.gif|Talk in English.}}|| ||<style="border:0;padding-left:30px;padding-bottom:15px">The field of Natural Language Processing (NLP) studies the abilities of computer systems to process and generate natural language, and has received increasing attention from the general population since the democratisation of generative and conversational models. However, behind the scenes, state-of-the-art NLP models are trained on ever-larger datasets, reaching trillions of tokens. It may be argued that the creation and use of such immense datasets is motivated by the idea that 'the larger the dataset, the more diverse it is', and that in turn 'if the training set is more diverse, it shall yield better models'. However, these statements thus far remain intuitions and need to be properly tested. To this end, this presentation will tackle methods and caveats of formal diversity quantification including limitations of the literature, a preliminary discussion on the link between diversity and dataset size, as well as their impact on downstream applications.|| |
| Line 12: | Line 12: |
| ||<style="border:0;padding-top:5px;padding-bottom:5px">'''18 October 2021'''|| ||<style="border:0;padding-left:30px;padding-bottom:0px">'''Jan Kocoń''', '''Przemysław Kazienko''' (Wrocław University of Technology)|| ||<style="border:0;padding-left:30px;padding-bottom:5px">'''Personalized NLP'''  {{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}|| ||<style="border:0;padding-left:30px;padding-bottom:15px">Many natural language processing tasks, such as classifying offensive, toxic, or emotional texts, are inherently subjective in nature. This is a major challenge, especially with regard to the annotation process. Humans tend to perceive textual content in their own individual way. Most current annotation procedures aim to achieve a high level of agreement in order to generate a high quality reference source. Existing machine learning methods commonly rely on agreed output values that are the same for all annotators. However, annotation guidelines for subjective content can limit annotators' decision-making freedom. Motivated by moderate annotation agreement on offensive and emotional content datasets, we hypothesize that a personalized approach should be introduced for such subjective tasks. We propose new deep learning architectures that take into account not only the content but also the characteristics of the individual. We propose different approaches for learning the representation and processing of data about text readers. Experiments were conducted on four datasets: Wikipedia discussion texts labeled with attack, aggression, and toxicity, and opinions annotated with ten numerical emotional categories. All of our models based on human biases and their representations significantly improve prediction quality in subjective tasks evaluated from an individual's perspective. Additionally, we have developed requirements for annotation, personalization, and content processing procedures to make our solutions human-centric.|| ||<style="border:0;padding-top:5px;padding-bottom:5px">'''8 November 2021'''|| ||<style="border:0;padding-left:30px;padding-bottom:0px">'''Ryszard Tuora''', '''Łukasz Kobyliński''' (Institute of Computer Science, Polish Academy of Sciences)|| ||<style="border:0;padding-left:30px;padding-bottom:5px">'''Dependency Trees in Automatic Inflection of Multi Word Expressions in Polish'''  {{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}|| ||<style="border:0;padding-left:30px;padding-bottom:15px">Talk summary will be available shortly.|| ||<style="border:0;padding-top:5px;padding-bottom:5px">'''22 November 2021''' (NOTE: the seminar will start at 10:00) || ||<style="border:0;padding-left:30px;padding-bottom:0px">'''Joanna Byszuk''' (Institute of Polish Language, Polish Academy of Sciences)|| ||<style="border:0;padding-left:30px;padding-bottom:5px">'''Talk title will be available shortly'''  {{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}|| ||<style="border:0;padding-left:30px;padding-bottom:15px">Talk summary will be available shortly.|| ||<style="border:0;padding-top:5px;padding-bottom:5px">'''29 November 2021'''|| ||<style="border:0;padding-left:30px;padding-bottom:0px">'''Piotr Przybyła''' (Institute of Computer Science, Polish Academy of Sciences)|| ||<style="border:0;padding-left:30px;padding-bottom:5px">'''When classification accuracy is not enough: Explaining news credibility assessment and measuring users' reaction'''  {{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}|| ||<style="border:0;padding-left:30px;padding-bottom:15px">Talk summary will be available shortly.|| ||<style="border:0;padding-top:10px">Please see also [[http://nlp.ipipan.waw.pl/NLP-SEMINAR/previous-e.html|the talks given in 2000–2015]] and [[http://zil.ipipan.waw.pl/seminar-archive|2015–2020]].|| |
||<style="border:0;padding-top:10px">Please see also [[http://nlp.ipipan.waw.pl/NLP-SEMINAR/previous-e.html|the talks given in 2000–2015]] and [[http://zil.ipipan.waw.pl/seminar-archive|2015–2025]].|| |
| Line 35: | Line 15: |
||<style="border:0;padding-top:5px;padding-bottom:5px">'''11 March 2024'''|| ||<style="border:0;padding-left:30px;padding-bottom:0px">'''Mateusz Krubiński''' (Charles University in Prague)|| ||<style="border:0;padding-left:30px;padding-bottom:5px">[[http://zil.ipipan.waw.pl/seminarium-online|{{attachment:seminarium-archiwum/teams.png}}]] '''Talk title will be given shortly'''  {{attachment:seminarium-archiwum/icon-en.gif|Talk in Polish.}}|| ||<style="border:0;padding-left:30px;padding-bottom:15px">Talk summary will be made available soon.|| |
Natural Language Processing Seminar 2025–2026
The NLP Seminar is organised by the Linguistic Engineering Group at the Institute of Computer Science, Polish Academy of Sciences (ICS PAS). It will restart in October and will take place on (some) Mondays, usually at 10:15 am, often online – please use the link next to the presentation title. All recorded talks are available on YouTube. |
15 September 2025 |
Louis Esteve (Universite Paris-Saclay) |
The field of Natural Language Processing (NLP) studies the abilities of computer systems to process and generate natural language, and has received increasing attention from the general population since the democratisation of generative and conversational models. However, behind the scenes, state-of-the-art NLP models are trained on ever-larger datasets, reaching trillions of tokens. It may be argued that the creation and use of such immense datasets is motivated by the idea that 'the larger the dataset, the more diverse it is', and that in turn 'if the training set is more diverse, it shall yield better models'. However, these statements thus far remain intuitions and need to be properly tested. To this end, this presentation will tackle methods and caveats of formal diversity quantification including limitations of the literature, a preliminary discussion on the link between diversity and dataset size, as well as their impact on downstream applications. |
Please see also the talks given in 2000–2015 and 2015–2025. |


