Locked History Actions


Natural Language Processing Seminar 2023–2024

The NLP Seminar is organised by the Linguistic Engineering Group at the Institute of Computer Science, Polish Academy of Sciences (ICS PAS). It takes place on (some) Mondays, usually at 10:15 am, often online – please use the link next to the presentation title. All recorded talks are available on YouTube.


9 October 2023

Agnieszka Mikołajczyk-Bareła, Wojciech Janowski (VoiceLab), Piotr Pęzik (University of Łódź / VoiceLab), Filip Żarnecki, Alicja Golisowicz (VoiceLab)

http://zil.ipipan.waw.pl/seminarium-online TRURL.AI: Fine-tuning large language models on multilingual instruction datasets  Talk delivered in Polish.

This talk will summarize our recent work on fine-tuning a large generative language model on bilingual instruction datasets, which resulted in the release of an open version of Trurl (trurl.ai). The motivation behind creating this model was to improve the performance of the original Llama 2 7B- and 13B-parameter models (Touvron et al. 2023), from which it was derived in a number of areas such as information extraction from customer-agent interactions and data labeling with a special focus on processing texts and instructions written in Polish. We discuss the process of optimizing the instruction datasets and the effect of the fine-tuning process on a number of selected downstream tasks.

16 October 2023

Konrad Wojtasik, Vadim Shishkin, Kacper Wołowiec, Arkadiusz Janz, Maciej Piasecki (Wrocław University of Science and Technology)

http://zil.ipipan.waw.pl/seminarium-online Evaluation of information retrieval models in zero-shot settings on different documents domains  Talk delivered in English.

Information Retrieval over large collections of documents is an extremely important research direction in the field of natural language processing. It is a key component in question-answering systems, where the answering model often relies on information contained in a database with up-to-date knowledge. This not only allows for updating the knowledge upon which the system responds to user queries but also limits its hallucinations. Currently, information retrieval models are neural networks and require significant training resources. For many years, lexical matching methods like BM25 outperformed trained neural models in Open Domain setting, but current architectures and extensive datasets allow surpassing lexical solutions. In the presentation, I will introduce available datasets for the evaluation and training of modern information retrieval architectures in document collections from various domains, as well as future development directions.

30 October 2023

Agnieszka Faleńska (University of Stuttgart)

http://zil.ipipan.waw.pl/seminarium-online Steps towards Bias-Aware NLP Systems  Talk in English.

For many, Natural Language Processing (NLP) systems have become everyday necessities, with applications ranging from automatic document translation to voice-controlled personal assistants. Recently, the increasing influence of these AI tools on human lives has raised significant concerns about the possible harm these tools can cause.

In this talk, I will start by showing a few examples of such harmful behaviors and discussing their potential origins. I will argue that biases in NLP models should be addressed by advancing our understanding of their linguistic sources. Then, the talk will zoom into three compelling case studies that shed light on inequalities in commonly used training data sources: Wikipedia, instructional texts, and discussion forums. Through these case studies, I will show that regardless of the perspective on the particular demographic group (speaking about, speaking to, and speaking as), subtle biases are present in all these datasets and can perpetuate harmful outcomes of NLP models.

13 November 2023

Piotr Rybak (Institute of Computer Science, Polish Academy of Sciences)

http://zil.ipipan.waw.pl/seminarium-online Advancing Polish Question Answering: Datasets and Models  Talk delivered in Polish. Slides in English.

Although question answering (QA) is one of the most popular topics in natural language processing, until recently it was virtually absent in the Polish scientific community. However, the last few years have seen a significant increase in work related to this topic. In this talk, I will discuss what question answering is, how current QA systems work, and what datasets and models are available for Polish QA. In particular, I will discuss the resources created at IPI PAN, namely the PolQA and MAUPQA and the Silver Retriever model. Finally, I will point out further directions of work that are still open when it comes to Polish question answering.

11 December 2023 (a series of short invited talks by Coventry Univerity researchers)

Xiaorui Jiang (Coventry University)

http://zil.ipipan.waw.pl/seminarium-online NLP for automating systematic reviews for evidence-based healthcare  Talk in English.

Systematic literature review (SLR) is the standard tool for synthesising medical and clinical evidence from the ocean of publications. SLR is extremely expensive. SLR is extremely expensive. AI can play a significant role in automating the SLR process, such as for citation screening, i.e., the selection of primary studies-based title and abstract. Some tools exist, but they suffer from tremendous obstacles, including lack of trust. In addition, a specific characteristic of systematic review, which is the fact that each systematic review is a unique dataset and starts with no annotation, makes the problem even more challenging. In this study, we present some seminal but initial efforts on utilising the transfer learning and zero-shot learning capabilities of pretrained language models and large language models to solve or alleviate this challenge. Preliminary results are to be reported.

Xiaorui Jiang (Coventry University)

http://zil.ipipan.waw.pl/seminarium-online Scientific text mining and summarisation  Talk in English.

It is a difficult task to understand and summarise the development of scientific research areas. This task is especially cognitively demanding for postgraduate students and early-career researchers, of the whose main jobs is to identify such developments by reading a large amount of literature. Will AI help? We believe so. This short talk summarises some recent initial work on extracting the semantic backbone of a scientific area through the synergy of natural language processing and network analysis, which is believed to serve a certain type of discourse models for summarisation (in future work). As a small step from it, the second part of the talk introduces how comparison citations are utilised to improve multi-document summarisation of scientific papers.

Xiaorui Jiang, Alireza Daneshkhah (Coventry University)

http://zil.ipipan.waw.pl/seminarium-online NLP for reducing GP workload: An early progress report  Talk in English.

In face of a post-COVID global economic slowdown and aging society, the primary care units in the National Healthcare Services (NHS) are receiving increasingly higher pressure, resulting in delays and errors in healthcare and patient management. AI can play a significant role in alleviating this investment-requirement discrepancy, especially in the primary care settings. A large portion of clinical diagnosis and management can be assisted with AI tools for automation and reduce delays. This short presentation reports the initial studies worked with an NHS partner on developing NLP-based solutions for the automation of clinical intention classification (to save more time for better patient treatment and management) and an early alert application for Gout Flare prediction from chief complaints (to avoid delays in patient treatment and management).

8 January 2024 (a series of presentation of DARIAH.Lab project results)

DARIAH.Lab project team (Institute of Computer Science, Polish Academy of Sciences)

Talk title will be available soon  Talk delivered in Polish.

Talk summary will be made available shortly.

29 January 2024

Adam Przepiórkowski (Institute of Computer Science, Polish Academy of Sciences)

Talk title will be available soon  Talk delivered in Polish.

Talk summary will be made available shortly.

12 February 2024

Tsimur Hadeliya, Dariusz Kajtoch (Allegro ML Research)

Evaluation and analysis of in-context learning for Polish classification tasks  Talk in Polish.

Talk summary will be made available shortly.

Please see also the talks given in 2000–2015 and 2015–2023.