Natural Language Processing Seminar 2015–2016
The NLP Seminar is organised by the Linguistic Engineering Group at the Institute of Computer Science, Polish Academy of Sciences (ICS PAS). It takes place on (some) Mondays, normally at 10:15 am, in the seminar room of the ICS PAS (ul. Jana Kazimierza 5, Warszawa). |
12 October 2015 |
Vincent Ng (University of Texas at Dallas) |
Recent years have seen considerable progress on the notoriously difficult task of coreference resolution owing in part to the availability of coreference-annotated corpora such as MUC, ACE, and OntoNotes. Coreference, however, is more than MUC/ACE/OntoNotes coreference: it encompasses many interesting cases of anaphora that are not covered in the extensively investigated MUC/ACE/OntoNotes entity coreference task. This talk examined several comparatively less-studied coreference tasks that were arguably no less challenging than the MUC/ACE/OntoNotes entity coreference task, including the Winograd Schema Challenge, zero anaphora resolution, and event coreference resolution. |
26 October 2015 |
Wojciech Jaworski (University of Warsaw) |
The author presented the parser being developed within CLARIN-PL project, its morphological pre-processing, a categorial grammar of Polish integrated with valency dictionary and used by the parser and the semantic graph formalism used for meaning representation. He also discussed algorithms used by the parser and optimization strategies, both related to performance and concise representation of ambiguous syntactic and semantic parsing trees. |
16 November 2015 |
Izabela Gatkowska (Jagiellonian University in Kraków) |
The empirical network of lexical links is the result of an experiment using a human associative mechanism – the person who is the subject of the research says the test first word that comes to his mind after understanding the stimulus word. The study was conducted in a cyclical manner, i.e. response words obtained in the first cycle were used as stimuli in the second cycle, which enabled the creation of a semantic network, which differs from the network created with the bodies of a text, for example, WORTSCHATZ and a network constructed by hand, for example. WordNet. The empirically obtained words, which are derived from those words in the network, have a direction and power connections. The set of incoming and outgoing connections, in which is found a specific expression, creates a lexical node network (subnet). The manner in which the network characterizes meaning, is shown in the example of feedback connections which are a specific example of the dependencies which appear between two words, appearing in the lexical node. A qualitative analysis of the semantic lexical relations known in linguistics, and employed for example in the WordNet dictionary, permit an interpretation of only approximately 25% of linkage feedback. The remaining links may be interpreted by referring to the model of the description of the significance as proposed in the FrameNet dictionary. A qualitative interpretation of all the links found in the lexical node may permit a study of the comparative lexical network nodes experimentally constructed for different natural languages, and may also allow, a separation of empirical semantic models employed by the same set of links found between nodes in a given network. |
30 November 2015 |
Dora Montagna (Universidad Autónoma de Madrid) |
The author presented a theoretical model of representation of meaning, based on Pustejovsky's theory of the Generative Lexicon. The proposal is intended as a base for automatic disambiguation, but also as a new model of lexicographic description. The model will be applied to a highly productive verb in Spanish, assuming the hypothesis of verbal underspecification in order to establish patterns of semantic behaviors. |
7 December 2015 |
Łukasz Kobyliński (Institute of Computer Science, Polish Academy of Sciences), Witold Kieraś (University of Warsaw) |
Morphosyntactic tagging of Polish – state of the art and future perspectives |
During the presentation, the state of the art in the area of automatic approaches to morphosyntactic tagging of Polish language text was discussed, with a particular focus on the analysis of performance of publicly available tools, which are possible to use in real applications. A qualitative and quantitative analysis of the errors made by the taggers was conducted, along with a discussion on the possible causes and solutions to these problems. Tagging results for Polish was compared and contrasted with the results for other European languages. |
8 December 2015 |
Salvador Pons Bordería (Universitat de València) |
Discourse Markers from a pragmatic perspective: The role of discourse units in defining functions |
One of the most disregarded aspects in the description of discourse markers is position. Notions such as "initial position" or "final position" are meaningless unless it can be specified with regard to what a DM is "initial" or "final". The presentation defended the idea that, for this question to be answered, appeal must be made to the notion of "discourse unit". Provided with a set of a) discourse units, and b) discourse positions, determining the function of a given DM is quasi-automatic. |
11 January 2016 |
Małgorzata Marciniak, Agnieszka Mykowiecka, Piotr Rychlik (Institute of Computer Science, Polish Academy of Sciences) |
The presentation addressed the problems of terminology extraction from Polish domain corpora. The authors described the C-value method to rank term candidates based on frequency measure and number of term contexts. The method takes into account nested terms that may not appear by themselves in data. Using this method, several nested grammatical subphrases are obtained which are syntactically correct, but semantically odd, like 'USG jamy' `USG of cavity’. The recognition of nested terms is supported by word connection strength which allows to eliminate truncated phrases from the top part of the term list. The talk was completed by the demo of the TermoPL tool. |
25 January 2015 |
Wojciech Jaworski (University of Warsaw) |
Syntactic-semantic parser for Polish: integration with lexical resources, parsing |
During the lecture the author presented the integration of syntactic-semantic with SGJP, Polimorf, Słowosieć and Walenty as well as preliminary observations concerning the impact that checking semantic preferences has on parsing. He also described a categorical formalism used to parse and presented briefly how the parser works. |
22 February 2016 |
Witold Dyrka (Wrocław University of Technology) |
Language(s) of proteins? – premises, contributions and perspectives |
In his speech the author presented arguments in favour of treating protein sequences, or higher protein structures, as sentences in some language(s). Then he plans to show several interesting results (my own and others') of application of quantitative methods of text analysis, and formal linguistics tools (such as probabilistic context-free grammars) for the analysis of proteins. Eventually, he presented plans of his further work on the "protein linguistics", which - as he hopes - would inspire an interesting discussion. |
22 February 2016 |
Linguistic Engineering Group (Institute of Computer Science, Polish Academy of Sciences) |
Extended seminar |
12:00–12:15: People, projects, tools |
12:15–12:45: Morfeusz 2: analyzer and inflectional synthesizer for Polish |
12:45–13:15: Toposław: Creating MWU lexicons |
13:15–13:45: Lunch break |
13:45–14:15: TermoPL: Terminology extraction from Polish data |
14:15–14:45: Walenty: Valency dictionary of Polish |
14:45–15:15: POLFIE: LFG grammar for Polish |
7 March 2016 |
Zbigniew Bronk (Grammatical Dictionary of Polish team member) |
JOD, a markup language for Polish declension, had been constructed in order to precisely describe inflectional rules and schemes for nouns and adjectives in Polish. Its first application was the description of inflection of surnames, taking into account the sex of the person or persons using the given surname. This model has been the basis for the "Automaton of declension of Polish surnames." The author presented the general idea of the language and the implementation of its interpreter, as well as the JOD editor and the website "Automaton of declension of Polish surnames". |
21 March 2016 |
Bartosz Zaborowski, Aleksander Zabłocki (Institute of Computer Science, Polish Academy of Sciences) |
In this talk the authors present a linguistic data search engine Poliqarp 2, on which they have been working for last three years. They describe both technical aspects as well as interesting features from the user's point of view. They briefly recall the data model supported by the engine, the structure of language supported by the new query engine, its expressive power, and differences compared to the previous version. In particular, they focus on elements added or modified during the development of the project (support for Składnica and LFG data models, post-processing, syntactic sugars). Among technicals they shortly present the software architecture and some details about the implementation of indexes. They also describe nontrivial decisions related to the input data processing (National Corpus of Polish in particular). They end the talk by presenting results of preliminary efficiency measurements. |
4 April 2016 |
Aleksander Wawer (Institute of Computer Science, Polish Academy of Sciences) |
Identification of opinion targets in Polish |
Seminar will conclude and summarise the results of a recently finished grant of The National Science Centre (NCN). It will present three resources with labelled sentiments and opinion targets, developed within the project: a bank of dependency trees, created from the corpus of product reviews, a subset of Skladnica dependency treebank and a collection of tweets. The seminar will include a discussion of experiments on automated recognition of opinion targets. These involve the use of two parsing methods: dependency and shallow, and a hybrid method in which the results of syntactic analysis are used by statistical models (eg. CRF). |
21 April 2016 (Thursday) |
Magdalena Derwojedowa (University of Warsaw) |
“Tem lepiej, ale jest to interes miljonowy i traktujemy go poważnie” – A thousand words a thousand times in 5 parts. |
Summary of the talk will be available shortly. |
9 May 2016 |
Daniel Janus (Rebased.pl) |
From unstructured data to searchable metadata-rich corpus: Skyscraper, P4, Smyrna |
Summary of the talk will be available shortly. |
19 May 2016 (Thursday) |
Kamil Kędzia, Konrad Krulikowski (University of Warsaw) |
Title of the talk will be available shortly. |
Summary of the talk will be available shortly. |
6 June 2016 |
Karol Opara (Systems Research Institute of the Polish Academy of Sciences) |
Grammatical rhymes in Polish poetry – a quantitative analysis |
Summary of the talk will be available shortly. |