Size: 10540
Comment:
|
Size: 11061
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 28: | Line 28: |
||<style="border:0;padding-top:5px;padding-bottom:5px">'''4 December 2023''' (a series of presentation of DARIAH.Lab project results) || ||<style="border:0;padding-left:30px;padding-bottom:0px">'''DARIAH.Lab project team''' (Institute of Computer Science, Polish Academy of Sciences)|| ||<style="border:0;padding-left:30px;padding-bottom:5px">'''Talk title will be available soon'''  {{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}|| ||<style="border:0;padding-left:30px;padding-bottom:15px">Talk summary will be made available shortly.|| |
|
Line 44: | Line 39: |
||<style="border:0;padding-top:15px;padding-bottom:5px">'''8 January 2024''' (a series of presentation of DARIAH.Lab project results) || ||<style="border:0;padding-left:30px;padding-bottom:0px">'''DARIAH.Lab project team''' (Institute of Computer Science, Polish Academy of Sciences)|| ||<style="border:0;padding-left:30px;padding-bottom:5px">'''Talk title will be available soon'''  {{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}|| ||<style="border:0;padding-left:30px;padding-bottom:15px">Talk summary will be made available shortly.|| ||<style="border:0;padding-top:15px;padding-bottom:5px">'''29 January 2024'''|| ||<style="border:0;padding-left:30px;padding-bottom:0px">'''Adam Przepiórkowski''' (Institute of Computer Science, Polish Academy of Sciences)|| ||<style="border:0;padding-left:30px;padding-bottom:5px">'''Talk title will be available soon'''  {{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}|| ||<style="border:0;padding-left:30px;padding-bottom:15px">Talk summary will be made available shortly.|| |
Natural Language Processing Seminar 2023–2024
The NLP Seminar is organised by the Linguistic Engineering Group at the Institute of Computer Science, Polish Academy of Sciences (ICS PAS). It takes place on (some) Mondays, usually at 10:15 am, often online – please use the link next to the presentation title. All recorded talks are available on YouTube. |
9 October 2023 |
Agnieszka Mikołajczyk-Bareła, Wojciech Janowski (VoiceLab), Piotr Pęzik (University of Łódź / VoiceLab), Filip Żarnecki, Alicja Golisowicz (VoiceLab) |
|
This talk will summarize our recent work on fine-tuning a large generative language model on bilingual instruction datasets, which resulted in the release of an open version of Trurl (trurl.ai). The motivation behind creating this model was to improve the performance of the original Llama 2 7B- and 13B-parameter models (Touvron et al. 2023), from which it was derived in a number of areas such as information extraction from customer-agent interactions and data labeling with a special focus on processing texts and instructions written in Polish. We discuss the process of optimizing the instruction datasets and the effect of the fine-tuning process on a number of selected downstream tasks. |
30 October 2023 |
Agnieszka Faleńska (University of Stuttgart) |
For many, Natural Language Processing (NLP) systems have become everyday necessities, with applications ranging from automatic document translation to voice-controlled personal assistants. Recently, the increasing influence of these AI tools on human lives has raised significant concerns about the possible harm these tools can cause. |
In this talk, I will start by showing a few examples of such harmful behaviors and discussing their potential origins. I will argue that biases in NLP models should be addressed by advancing our understanding of their linguistic sources. Then, the talk will zoom into three compelling case studies that shed light on inequalities in commonly used training data sources: Wikipedia, instructional texts, and discussion forums. Through these case studies, I will show that regardless of the perspective on the particular demographic group (speaking about, speaking to, and speaking as), subtle biases are present in all these datasets and can perpetuate harmful outcomes of NLP models. |
13 November 2023 |
Piotr Rybak (Institute of Computer Science, Polish Academy of Sciences) |
Advancing Polish Question Answering: Datasets and Models |
Although question answering (QA) is one of the most popular topics in natural language processing, until recently it was virtually absent in the Polish scientific community. However, the last few years have seen a significant increase in work related to this topic. In this talk, I will discuss what question answering is, how current QA systems work, and what datasets and models are available for Polish QA. In particular, I will discuss the resources created at IPI PAN, namely the PolQA and MAUPQA datasets and the Silver Retriever model. Finally, I will point out further directions of work that are still open when it comes to Polish question answering. |
11 December 2023 (a series of short invited talks by Coventry Univerity researchers) |
Xiaorui Jiang (Coventry University) |
NLP for automating systematic reviews for evidence-based healthcare |
Talk summary will be made available shortly. |
Xiaorui Jiang (Coventry University) |
Scientific text mining and summarisation |
Talk summary will be made available shortly. |
Xiaorui Jiang, Alireza Daneshkhah (Coventry University) |
NLP for reducing GP workload: An early progress report |
Talk summary will be made available shortly. |
8 January 2024 (a series of presentation of DARIAH.Lab project results) |
DARIAH.Lab project team (Institute of Computer Science, Polish Academy of Sciences) |
Talk title will be available soon |
Talk summary will be made available shortly. |
29 January 2024 |
Adam Przepiórkowski (Institute of Computer Science, Polish Academy of Sciences) |
Talk title will be available soon |
Talk summary will be made available shortly. |
Please see also the talks given in 2000–2015 and 2015–2023. |