Locked History Actions

seminar

Natural Language Processing Seminar 2018–2019

The NLP Seminar is organised by the Linguistic Engineering Group at the Institute of Computer Science, Polish Academy of Sciences (ICS PAS). It takes place on (some) Mondays, normally at 10:15 am, in the seminar room of the ICS PAS (ul. Jana Kazimierza 5, Warszawa). All recorded talks are available on YouTube.

seminarium

1 October 2018

Janusz S. Bień (University of Warsaw – prof. emeritus)

https://www.youtube.com/watch?v=mOYzwpjTAf4 Electronic indexes to lexicographical resources  Talk delivered in Polish.

We will focus on the indexes to lexicographical resources available online in DjVu format. Such indexes can be browsed, searched, modified and created with the djview4poliqarp open source program; the origins and the history of the program will be briefly presented. Originally the index support was added to the program to handle the list of entries in the 19th century Linde's dictionary, but can be used conveniently also for other resources, as will be demonstrated on selected examples. In particular some new features, introduced to the program in the last months, will be presented publicly for the first time.

15 October 2018

Wojciech Jaworski, Szymon Rutkowski (University of Warsaw)

https://www.youtube.com/watch?v=SbPAdmRmW08 A multilayer rule based model of Polish inflection  Talk delivered in Polish.

The presentation will be devoted to the multilayer model of Polish inflection. The model has been developed on the basis of Grammatical Dictionary of Polish; it does not use the concept of a inflexion paradigm. The model consists of three layers of hand-made rules: "orthographic-phonetic layer" converting a segment to representation reflecting morphological patterns of the language, "analytic layer" generating lemma and determining affix and "interpretation layer" giving a morphosyntactic interpretation based on detected affixes. The model provides knowledge about the language to a morphological analyzer supplemented with the function of guessing lemmas and morphosyntactic interpretations for non-dictionary forms (guesser). The second use of the model is generation of word forms based on lemma and morphosyntactic interpretation. The presentation will also cover the issue of disambiguation of the results provided by the morphological analyzer. The demo version of the program is available on the Internet.

29 October 2018

Jakub Waszczuk (Heinrich-Heine-Universität Düsseldorf)

https://www.youtube.com/watch?v=zjGQRG2PNu0 From morphosyntactic tagging to identification of verbal multiword expressions: a discriminative approach  Talk delivered in Polish. Slides in English.

The first part of the talk was dedicated to Concraft-pl 2.0, the new version of a morphosyntactic tagger for Polish based on conditional random fields. Concraft-pl 2.0 performs morphosyntactic segmentation as a by-product of disambiguation, which allows to use it directly on the segmentation graphs provided by the analyser Morfeusz. This is in contrast with other existing taggers for Polish, which either neglect the problem of segmentation or rely on heuristics to perform it in a pre-processing stage. During the second part, an approach to identifying verbal multiword expressions (VMWEs) based on dependency parsing results was presented. In this approach, VMWE identification is reduced to the problem of dependency tree labeling, where one of two labels (MWE or not-MWE) must be predicted for each node in the dependency tree. The underlying labeling model can be seen as conditional random fields (as used in Concraft) adapted to tree structures. A system based on this approach ranked 1st in the closed track of the PARSEME shared task 2018.

5 November 2018

Jakub Kozakoszczak (Faculty of Modern Languages, University of Warsaw / Heinrich-Heine-Universität Düsseldorf)

https://www.youtube.com/watch?v=sz7dGmf8p3k Mornings to Wednesdays — semantics and normalization of Polish quasi-periodical temporal expression  Talk delivered in Polish.

The standard interpretations of expressions like “Januarys” and “Fridays” in temporal representation and reasoning are slices of collections of 2nd order, e.g. all the sixth elements of day sequences of cardinality 7 aligned with calendar weeks. I will present results of the work on normalizing most frequent Polish quasi-periodical temporal expressions for online booking systems. On the linguistic side I will argue against synonymy of the kind “Fridays” = “sixth days of the weeks” and give semantic tests for rudimentary classification of quasi-periodicity. In the formal part I will propose an extension to existing formalisms covering intensional quasi-periodical operators “from”, “to”, “before” and “after” restricted to monotonic domains. In the implementation part I will demonstrate an algorithm for lazy generation of generalized intersection of collections.

19 November 2018

Daniel Zeman (Institute of Formal and Applied Linguistics, Charles University in Prague)

https://www.youtube.com/watch?v=xUmZ8Mxcmg0 Universal Dependencies and the Slavic Languages  Talk delivered in English.

I will present Universal Dependencies, a worldwide community effort aimed at providing multilingual corpora, annotated at the morphological and syntactic levels following unified annotation guidelines. I will discuss the concept of core arguments, one of the cornerstones of the UD framework. In the second part of the talk I will focus on some interesting problems and challenges of applying Universal Dependencies to the Slavic languages. I will discuss examples from 12 Slavic languages that are currently represented in UD and show that cross-linguistic consistency can still be improved.

3 December 2018

Ekaterina Lapshinova-Koltunski (Saarland University)

https://www.youtube.com/watch?v=UQ_6dDNEw8E Analysis and Annotation of Coreference for Contrastive Linguistics and Translation Studies  Talk delivered in English.

In this talk, I will report on the ongoing work on coreference analysis in a multilingual context. I will present two approaches in the analysis of coreference and coreference-related phenomena: (1) top-down or theory-driven: here we start from some linguistic knowledge derived from the existing frameworks, define linguistic categories to analyse and create an annotated corpus that can be used either for further linguistic analysis or as training data for NLP applications; (2) bottom-up or data-driven: in this case, we start from a set of features of shallow character that we believe are discourse-related. We extract these structures from a huge amount of data and analyse them from a linguistic point of view trying to describe and explain the observed phenomena from the point of view of existing theories and grammars.

7 January 2019

Adam Przepiórkowski (Institute of Computer Science, Polish Academy of Sciences / University of Warsaw), Agnieszka Patejuk (Institute of Computer Science, Polish Academy of Sciences / University of Oxford)

Enhanced Universal Dependencies  Talk delivered in Polish.

The aim of this talk is to present the two threads of our recent work on Universal Dependencies (UD), a standard for syntactically annotated corpora (http://universaldependencies.org/). The first thread is concerned with the developement of a new UD treebank of Polish, one that makes extensive use of the enhanced level of representation made available in the current UD standard. The treebank is the result of conversion from an earlier ‘treebank’ of Polish, one that was annotated with constituency and functional structures as they are understood in Lexical Functional Grammar. We will outline the conversion procedure and present the resulting UD treebank of Polish. The second thread is concerned with various inconsistencies and deficiencies of UD that we identified in the process of developing the UD treebank of Polish. We will concentrate on two particularly problematic areas in UD, namely, on the core/oblique distinction, which aims to – but does not really – replace the infamous argument/adjunct dichotomy, and on coordination, a phenomenon problematic for all dependency approaches.

14 January 2019

Agata Savary (François Rabelais University Tours)

Talk title will be available shortly  Talk delivered in Polish. Slides in English.

Talk summary will be available shortly.

21 January 2019

Marek Łaziński (University of Warsaw), Michał Woźniak (Jagiellonian University)

Talk title will be available shortly  Talk delivered in Polish.

Talk summary will be available shortly.

11 February 2019

Anna Wróblewska (Warsaw University of Technology)

Talk title will be available shortly  Talk delivered in Polish.

Talk summary will be available shortly.

25 lutego 2019

Jakub Dutkiewicz (Poznan University of Technology)

Empirical research on medical information retrieval  Talk delivered in Polish.

Talk summary will be available shortly.

Please see also the talks given in 2000–2015 and 2015–2018.