Differences between revisions 188 and 736 (spanning 548 versions)

Natural Language Processing Seminar 2025–2026

The NLP Seminar is organised by the Linguistic Engineering Group at the Institute of Computer Science, Polish Academy of Sciences (ICS PAS). It will restart in October and will take place on (some) Mondays, usually at 10:15 am, often online – please use the link next to the presentation title. All recorded talks are available on YouTube.

Please see also the talks given in 2000–2015 and 2015–2025.

-  ⇤ ← Revision 188 as of 2018-10-09 10:54:56 → 
  Size: 8070
  Editor: MaciejOgrodniczuk
  Comment:
+   ← Revision 736 as of 2025-08-18 22:54:05 → ⇥
  Size: 3299
  Editor: MaciejOgrodniczuk
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 3:
-= Natural Language Processing Seminar 2018–2019 =
+= Natural Language Processing Seminar 2025–2026 =
 Line 5:
-||<style="border:0;padding-bottom:10px">The NLP Seminar is organised by the [[http://nlp.ipipan.waw.pl/|Linguistic Engineering Group]] at the [[http://www.ipipan.waw.pl/en/|Institute of Computer Science]], [[http://www.pan.pl/index.php?newlang=english|Polish Academy of Sciences]] (ICS PAS). It takes place on (some) Mondays, normally at 10:15 am, in the seminar room of the ICS PAS (ul. Jana Kazimierza 5, Warszawa). All recorded talks are available [[https://www.youtube.com/channel/UC5PEPpMqjAr7Pgdvq0wRn0w|on YouTube]]. ||<style="border:0;padding-left:30px">[[seminarium|{{attachment:seminar-archive/pl.png}}]]||
+||<style="border:0;padding-bottom:10px">The NLP Seminar is organised by the [[http://nlp.ipipan.waw.pjl/|Linguistic Engineering Group]] at the [[http://www.ipipan.waw.pl/en/|Institute of Computer Science]], [[http://www.pan.pl/index.php?newlang=english|Polish Academy of Sciences]] (ICS PAS). It will restart in October and will take place on (some) Mondays, usually at 10:15 am, often online – please use the link next to the presentation title. All recorded talks are available on [[https://www.youtube.com/ipipan|YouTube]]. ||<style="border:0;padding-left:30px">[[seminarium|{{attachment:seminar-archive/pl.png}}]]||
 Line 7:
-||<style="border:0;padding-top:5px;padding-bottom:5px">'''1 October 2018'''||
||<style="border:0;padding-left:30px;padding-bottom:0px">'''Janusz S. Bień''' (University of Warsaw – prof. emeritus)||
||<style="border:0;padding-left:30px;padding-bottom:5px">[[https://www.youtube.com/watch?v=mOYzwpjTAf4|{{attachment:seminarium-archiwum/youtube.png}}]] '''[[attachment:seminarium-archiwum/2018-10-01.pdf|Electronic indexes to lexicographical resources]]''' &#160;{{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">We will focus on the indexes to lexicographical resources available online in !DjVu format. Such indexes can be browsed, searched, modified and created with the djview4poliqarp open source program; the origins and the history of the program will be briefly presented. Originally the index support was added to the program to handle the list of entries in the 19th century Linde's dictionary, but can be used conveniently also for other resources, as will be demonstrated on selected examples. In particular some new features, introduced to the program in the last months, will be presented publicly for the first time.||
+||<style="border:0;padding-top:10px">Please see also [[http://nlp.ipipan.waw.pl/NLP-SEMINAR/previous-e.html|the talks given in 2000–2015]] and [[http://zil.ipipan.waw.pl/seminar-archive|2015–2025]].||
-Line 12:
+Line 9:
-||<style="border:0;padding-top:5px;padding-bottom:5px">'''15 October 2018'''||
||<style="border:0;padding-left:30px;padding-bottom:0px">'''Wojciech Jaworski, Szymon Rutkowski''' (University of Warsaw)||
||<style="border:0;padding-left:30px;padding-bottom:5px">'''A multilayer rule based model of Polish inflection''' &#160;{{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">The presentation will be devoted to the multilayer model of Polish inflection. The model has been developed on the basis of Grammatical Dictionary of Polish; it does not use the concept of a inflexion paradigm. The model consists of three layers of hand-made rules: "orthographic-phonetic layer" converting a segment to representation reflecting morphological patterns of the language, "analytic layer" generating lemma and determining affix and "interpretation layer" giving a morphosyntactic interpretation based on detected affixes. The model provides knowledge about the language to a morphological analyzer supplemented with the function of guessing lemmas and morphosyntactic interpretations for non-dictionary forms (guesser). The second use of the model is generation of word forms based on lemma and morphosyntactic interpretation. The presentation will also cover the issue of disambiguation of the results provided by the morphological analyzer. The demo version of the program is available on the Internet.||

||<style="border:0;padding-top:5px;padding-bottom:5px">'''29 October 2018'''||
||<style="border:0;padding-left:30px;padding-bottom:0px">'''Jakub Waszczuk''' (Heinrich-Heine-Universität Düsseldorf)||
||<style="border:0;padding-left:30px;padding-bottom:5px">'''Integrating multiword expression in syntactic parsing using A* and discriminative modeling methods''' &#160;{{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">During the seminar I will present two different approaches to identifying verbal multiword expressions (VMWEs) in coordination with syntactic parsing.  The first approach consists in promoting MWEs in A* TAG (tree-adjoining grammar) parsing.  It assumes that potential MWE candidates are pre-identified prior to parsing (e.g., as a result of MWE-aware supertagging).  The experiments performed on Składnica showed that this strategy allows to significantly prune the parser's search space with little loss in syntactic parsing accuracy. In the second approach, VMWE identification is deferred to a post-processing phase in which (dependency) parsing results are already determined.  VMWE identification is then reduced to the problem of dependency tree labeling, where one of two labels (MWE or not-MWE) must be predicted for each node in the dependency tree.  A system based on this approach, using multiclass logistic regression for tree labeling, ranked 1st in the closed track of the PARSEME shared task 2018. A part of the talk will be also dedicated to Concraft-pl 2.0, the new version of a morphosyntactic tagger for Polish based on conditional random fields.  Concraft-pl 2.0 performs morphosyntactic segmentation as a by-product of disambiguation, which allows to use it directly on the segmentation graphs provided by Morfeusz.  This is in contrast with other existing taggers for Polish, which either neglect the problem of segmentation or rely on heuristics to perform it in a pre-processing stage.||

||<style="border:0;padding-top:5px;padding-bottom:5px">'''5 November 2018'''||
||<style="border:0;padding-left:30px;padding-bottom:0px">'''Jakub Kozakoszczak''' (Faculty of Modern Languages, University of Warsaw / Heinrich-Heine-Universität Düsseldorf)||
||<style="border:0;padding-left:30px;padding-bottom:5px">'''Talk title will be available shortly''' &#160;{{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">Talk summary will be available shortly.||

||<style="border:0;padding-top:5px;padding-bottom:5px">'''19 November 2018'''||
||<style="border:0;padding-left:30px;padding-bottom:0px">'''Daniel Zeman''' (Institute of Formal and Applied Linguistics, Charles University, Czech Republic)||
||<style="border:0;padding-left:30px;padding-bottom:5px">'''Talk title will be available shortly''' &#160;{{attachment:seminarium-archiwum/icon-en.gif|Talk delivered in English.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">Talk summary will be available shortly.||

||<style="border:0;padding-top:5px;padding-bottom:5px">'''3 December 2018'''||
||<style="border:0;padding-left:30px;padding-bottom:0px">'''Ekaterina Lapshinova-Koltunski''' (Saarland University)||
||<style="border:0;padding-left:30px;padding-bottom:5px">'''Analysis and Annotation of Coreference for Contrastive Linguistics and Translation Studies''' &#160;{{attachment:seminarium-archiwum/icon-en.gif|Talk delivered in English.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">In this talk, I will report on the ongoing work on coreference analysis in a multilingual context. I will present two approaches in the analysis of coreference and coreference-related phenomena: (1) top-down or theory-driven: here we start from some linguistic knowledge derived from the existing frameworks, define linguistic categories to analyse and create an annotated corpus that can be used either for further linguistic analysis or as training data for NLP applications; (2) bottom-up or data-driven: in this case, we start from a set of features of shallow character that we believe are discourse-related. We extract these structures from a huge amount of data and analyse them from a linguistic point of view trying to describe and explain the observed phenomena from the point of view of existing theories and grammars.||
+{{{#!wiki comment
-Line 38:
+Line 12:
-||<style="border:0;padding-top:10px">Please see also [[http://nlp.ipipan.waw.pl/NLP-SEMINAR/previous-e.html|the talks given in 2000–2015]] and [[http://zil.ipipan.waw.pl/seminar-archive|2015–2018]].||
+||<style="border:0;padding-top:5px;padding-bottom:5px">'''11 March 2024'''||
||<style="border:0;padding-left:30px;padding-bottom:0px">'''Mateusz Krubiński''' (Charles University in Prague)||
||<style="border:0;padding-left:30px;padding-bottom:5px">[[http://zil.ipipan.waw.pl/seminarium-online|{{attachment:seminarium-archiwum/teams.png}}]] '''Talk title will be given shortly''' &#160;{{attachment:seminarium-archiwum/icon-en.gif|Talk in Polish.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">Talk summary will be made available soon.||

||<style="border:0;padding-top:5px;padding-bottom:5px">'''2 April 2020'''||
||<style="border:0;padding-left:30px;padding-bottom:0px">'''Stan Matwin''' (Dalhousie University)||
||<style="border:0;padding-left:30px;padding-bottom:5px">'''Efficient training of word embeddings with a focus on negative examples''' &#160;{{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}} {{attachment:seminarium-archiwum/icon-en.gif|Slides in English.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">This presentation is based on our [[https://pdfs.semanticscholar.org/1f50/db5786913b43f9668f997fc4c97d9cd18730.pdf|AAAI 2018]] and [[https://aaai.org/ojs/index.php/AAAI/article/view/4683|AAAI 2019]] papers on English word embeddings. In particular, we examine the notion of “negative examples”, the unobserved or insignificant word-context co-occurrences, in spectral methods. we provide a new formulation for the word embedding problem by proposing a new intuitive objective function that perfectly justifies the use of negative examples. With the goal of efficient learning of embeddings, we propose a kernel similarity measure for the latent space that can effectively calculate the similarities in high dimensions. Moreover, we propose an approximate alternative to our algorithm using a modified Vantage Point tree and reduce the computational complexity of the algorithm with respect to the number of words in the vocabulary. We have trained various word embedding algorithms on articles of Wikipedia with 2.3 billion tokens and show that our method outperforms the state-of-the-art in most word similarity tasks by a good margin. We will round up our discussion with some general thought s about the use of embeddings in modern NLP.||
}}}

Diff for "seminar"

Menu

Natural Language Processing Seminar 2025–2026