Differences between revisions 567 and 568

Natural Language Processing Seminar 2023–2024

The NLP Seminar is organised by the Linguistic Engineering Group at the Institute of Computer Science, Polish Academy of Sciences (ICS PAS). It takes place on (some) Mondays, usually at 10:15 am, often online – please use the link next to the presentation title. All recorded talks are available on YouTube.

9 October 2023

Agnieszka Mikołajczyk-Bareła, Wojciech Janowski (VoiceLab), Piotr Pęzik (University of Łódź / VoiceLab), Filip Żarnecki, Alicja Golisowicz (VoiceLab)

TRURL.AI: Fine-tuning large language models on multilingual instruction datasets

This talk will summarize our recent work on fine-tuning a large generative language model on bilingual instruction datasets, which resulted in the release of an open version of Trurl (trurl.ai). The motivation behind creating this model was to improve the performance of the original Llama 2 7B- and 13B-parameter models (Touvron et al. 2023), from which it was derived in a number of areas such as information extraction from customer-agent interactions and data labeling with a special focus on processing texts and instructions written in Polish. We discuss the process of optimizing the instruction datasets and the effect of the fine-tuning process on a number of selected downstream tasks.

16 October 2023

Konrad Wojtasik, Vadim Shishkin, Kacper Wołowiec, Arkadiusz Janz, Maciej Piasecki (Wrocław University of Science and Technology)

Evaluation of information retrieval models in zero-shot settings on different documents domains

Information Retrieval over large collections of documents is an extremely important research direction in the field of natural language processing. It is a key component in question-answering systems, where the answering model often relies on information contained in a database with up-to-date knowledge. This not only allows for updating the knowledge upon which the system responds to user queries but also limits its hallucinations. Currently, information retrieval models are neural networks and require significant training resources. For many years, lexical matching methods like BM25 outperformed trained neural models in Open Domain setting, but current architectures and extensive datasets allow surpassing lexical solutions. In the presentation, I will introduce available datasets for the evaluation and training of modern information retrieval architectures in document collections from various domains, as well as future development directions.

30 October 2023

Agnieszka Faleńska (University of Stuttgart)

Steps towards Bias-Aware NLP Systems

The summary will be available soon.

13 November 2023

Piotr Rybak (Institute of Computer Science, Polish Academy of Sciences)

Advancing Polish Question Answering: Datasets and Models

The summary will be available soon.

Please see also the talks given in 2000–2015 and 2015–2023.

-  ⇤ ← Revision 567 as of 2023-10-17 09:28:17 → 
  Size: 6807
  Editor: MaciejOgrodniczuk
  Comment:
+   ← Revision 568 as of 2023-10-19 09:11:09 → ⇥
  Size: 6899
  Editor: MaciejOgrodniczuk
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 19:
-||<style="border:0;padding-left:30px;padding-bottom:5px">'''Steps towards Bias-Aware NLP Systems''' &#160;{{attachment:seminarium-archiwum/icon-en.gif|Talk in English.}}||
+||<style="border:0;padding-left:30px;padding-bottom:5px">[[http://zil.ipipan.waw.pl/seminarium-online|{{attachment:seminarium-archiwum/teams.png}}]] '''Steps towards Bias-Aware NLP Systems''' &#160;{{attachment:seminarium-archiwum/icon-en.gif|Talk in English.}}||

Diff for "seminar"

Menu

Natural Language Processing Seminar 2023–2024