Size: 4282
Comment:
|
Size: 4919
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 3: | Line 3: |
= Natural Language Processing Seminar 2022–2023 = | = Natural Language Processing Seminar 2023–2024 = |
Line 7: | Line 7: |
||<style="border:0;padding-top:5px;padding-bottom:5px">'''3 October 2022'''|| ||<style="border:0;padding-left:30px;padding-bottom:0px">'''Sławomir Dadas''' (National Information Processing Institute)|| ||<style="border:0;padding-left:30px;padding-bottom:5px">[[https://www.youtube.com/watch?v=TGwLeE1Y5X4|{{attachment:seminarium-archiwum/youtube.png}}]] '''[[attachment:seminarium-archiwum/2022-10-03.pdf|Our experience with training neural sentence encoders for the Polish language]]'''  {{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}|| ||<style="border:0;padding-left:30px;padding-bottom:15px">Representing sentences or short texts as dense vectors with a fixed number of dimensions is a common technique in tasks such as information retrieval, question answering, text clustering or plagiarism detection. A simple method to construct such representation is to aggregate vectors generated by a language model or extracted from word embeddings. However, higher quality representations can be obtained by fine-tuning a language model on a dataset of semantically similar sentence pairs. In this presentation, we will introduce methods for learning sentence encoders based on the Transformer architecture as well as our experiences with training such models for the Polish language. In addition, we will discuss approaches for building large datasets of paraphrases using publicly available corpora. We will also show a practical application of sentence encoders in a system developed for finding abusive clauses in consumer agreements.|| |
||<style="border:0;padding-top:5px;padding-bottom:5px">'''9 October 2023'''|| ||<style="border:0;padding-left:30px;padding-bottom:0px">'''Agnieszka Mikołajczyk-Bareła''', '''Wojciech Janowski''' (!VoiceLab), '''Piotr Pęzik''' (University of Łódź / !VoiceLab), '''Filip Żarnecki''', '''Alicja Golisowicz''' (!VoiceLab)|| ||<style="border:0;padding-left:30px;padding-bottom:5px">'''Open Trurl'''  {{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}|| ||<style="border:0;padding-left:30px;padding-bottom:15px">The summary will be available soon.|| |
Line 12: | Line 12: |
||<style="border:0;padding-top:10px">Please see also [[http://nlp.ipipan.waw.pl/NLP-SEMINAR/previous-e.html|the talks given in 2000–2015]] and [[http://zil.ipipan.waw.pl/seminar-archive|2015–2020]].|| | ||<style="border:0;padding-top:5px;padding-bottom:5px">'''16 October 2023'''|| ||<style="border:0;padding-left:30px;padding-bottom:0px">'''Konrad Wojtasik''', '''Vadim Shishkin''', '''Kacper Wołowiec''', '''Arkadiusz Janz''', '''Maciej Piasecki''' (Wrocław University of Science and Technology)|| ||<style="border:0;padding-left:30px;padding-bottom:5px">'''Evaluation of information retrieval models in zero-shot settings on different documents domains'''  {{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}|| ||<style="border:0;padding-left:30px;padding-bottom:15px">The summary will be available soon.|| ||<style="border:0;padding-top:5px;padding-bottom:5px">'''30 October 2023'''|| ||<style="border:0;padding-left:30px;padding-bottom:0px">'''Agnieszka Faleńska''' (University of Stuttgart)|| ||<style="border:0;padding-left:30px;padding-bottom:5px">'''Steps towards Bias-Aware NLP Systems'''  {{attachment:seminarium-archiwum/icon-en.gif|Talk in English.}}|| ||<style="border:0;padding-left:30px;padding-bottom:15px">The summary will be available soon.|| ||<style="border:0;padding-top:5px;padding-bottom:5px">'''13 November 2023'''|| ||<style="border:0;padding-left:30px;padding-bottom:0px">'''Piotr Rybak''' (Institute of Computer Science, Polish Academy of Sciences)|| ||<style="border:0;padding-left:30px;padding-bottom:5px">'''Advancing Polish Question Answering: Datasets and Models'''  {{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}|| ||<style="border:0;padding-left:30px;padding-bottom:15px">The summary will be available soon.|| ||<style="border:0;padding-top:10px">Please see also [[http://nlp.ipipan.waw.pl/NLP-SEMINAR/previous-e.html|the talks given in 2000–2015]] and [[http://zil.ipipan.waw.pl/seminar-archive|2015–2023]].|| |
Natural Language Processing Seminar 2023–2024
The NLP Seminar is organised by the Linguistic Engineering Group at the Institute of Computer Science, Polish Academy of Sciences (ICS PAS). It takes place on (some) Mondays, usually at 10:15 am, often online – please use the link next to the presentation title. All recorded talks are available on YouTube. |
9 October 2023 |
Agnieszka Mikołajczyk-Bareła, Wojciech Janowski (VoiceLab), Piotr Pęzik (University of Łódź / VoiceLab), Filip Żarnecki, Alicja Golisowicz (VoiceLab) |
Open Trurl |
The summary will be available soon. |
16 October 2023 |
Konrad Wojtasik, Vadim Shishkin, Kacper Wołowiec, Arkadiusz Janz, Maciej Piasecki (Wrocław University of Science and Technology) |
Evaluation of information retrieval models in zero-shot settings on different documents domains |
The summary will be available soon. |
30 October 2023 |
Agnieszka Faleńska (University of Stuttgart) |
Steps towards Bias-Aware NLP Systems |
The summary will be available soon. |
13 November 2023 |
Piotr Rybak (Institute of Computer Science, Polish Academy of Sciences) |
Advancing Polish Question Answering: Datasets and Models |
The summary will be available soon. |
Please see also the talks given in 2000–2015 and 2015–2023. |