Locked History Actions

Diff for "seminar"

Differences between revisions 1 and 55 (spanning 54 versions)
Revision 1 as of 2016-06-27 22:35:36
Size: 834
Comment:
Revision 55 as of 2017-02-22 13:42:40
Size: 17113
Comment:
Deletions are marked like this. Additions are marked like this.
Line 5: Line 5:
||<style="border:0;padding:0">The NLP Seminar is organised by the [[http://nlp.ipipan.waw.pl/|Linguistic Engineering Group]] at the [[http://www.ipipan.waw.pl/en/|Institute of Computer Science]], [[http://www.pan.pl/index.php?newlang=english|Polish Academy of Sciences]] (ICS PAS). It takes place on (some) Mondays, normally at 10:15 am, in the seminar room of the ICS PAS (ul. Jana Kazimierza 5, Warszawa). ||<style="border:0;padding-left:30px">[[seminarium-archiwum|{{attachment:pl.png}}]]|| ||<style="border:0;padding-bottom:10px">The NLP Seminar is organised by the [[http://nlp.ipipan.waw.pl/|Linguistic Engineering Group]] at the [[http://www.ipipan.waw.pl/en/|Institute of Computer Science]], [[http://www.pan.pl/index.php?newlang=english|Polish Academy of Sciences]] (ICS PAS). It takes place on (some) Mondays, normally at 10:15 am, in the seminar room of the ICS PAS (ul. Jana Kazimierza 5, Warszawa). ||<style="border:0;padding-left:30px">[[seminarium|{{attachment:seminar-archive/pl.png}}]]||
Line 7: Line 7:
||<style="border:0;padding-top:10px">Please come back in October! And now see [[http://nlp.ipipan.waw.pl/NLP-SEMINAR/previous-e.html|the talks given between 2000 and 2015]] and [[http://zil.ipipan.waw.pl/seminar|2015-16]]. ||<style="border:0;padding-top:5px;padding-bottom:5px">'''10 October 2016'''||
||<style="border:0;padding-left:30px;padding-bottom:0px">'''Katarzyna Pakulska''', '''Barbara Rychalska''', '''Krystyna Chodorowska''', '''Wojciech Walczak''', '''Piotr Andruszkiewicz''' (Samsung)||
||<style="border:0;padding-left:30px;padding-bottom:5px">'''[[attachment:seminarium-archiwum/2016-10-10.pdf|Paraphrase Detection Ensemble – SemEval 2016 winner]]''' &#160;{{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">This seminar describes the winning solution designed for a core track within the !SemEval 2016 English Semantic Textual Similarity (STS) task. The goal of the competition was to measure semantic similarity between two given sentences on a scale from 0 to 5. At the same time the solution should replicate human language understanding. The presented model is a novel hybrid of recursive auto-encoders from deep learning (RAE) and a !WordNet award-penalty system, enriched with a number of other similarity models and features used as input for Linear Support Vector Regression.||

||<style="border:0;padding-top:5px;padding-bottom:5px">'''24 October 2016'''||
||<style="border:0;padding-left:30px;padding-bottom:0px">'''Adam Przepiórkowski, Jakub Kozakoszczak, Jan Winkowski, Daniel Ziembicki, Tadeusz Teleżyński''' (Institute of Computer Science, Polish Academy of Sciences / University of Warsaw)||
||<style="border:0;padding-left:30px;padding-bottom:5px">'''[[attachment:seminarium-archiwum/2016-10-24.pdf|Corpus of formalized textual entailment steps]]''' &#160;{{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">The authors present resources created within CLARIN project aiming to help with qualitative evaluation of RTE systems: two textual derivations corpora and a corpus of textual entailment rules. Textual derivation is a series of atomic steps which connects Text with Hypothesis in a textual entailment pair. Original pairs are taken from the FraCaS corpus and a polish translation of the RTE3 corpus. Textual entailment rule sanctions textual entailment relation between the input and the output of a step, using syntactic patterns written in the UD standard and some other semantic, logical and contextual constraints expressed in FOL.||

||<style="border:0;padding-top:5px;padding-bottom:5px">'''7 November 2016'''||
||<style="border:0;padding-left:30px;padding-bottom:0px">'''Rafał Jaworski''' (Adam Mickiewicz University in Poznań)||
||<style="border:0;padding-left:30px;padding-bottom:5px">''' [[attachment:seminarium-archiwum/2016-11-07.pdf|Concordia – translation memory search algorithm]]''' &#160;{{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">The talk covers the Concordia algorithm (http://tmconcordia.sourceforge.net/), which is used to maximize the productivity of a human translator. The algorithm combines the features of standard fuzzy translation memory searching with a concordancer. As the key non-functional requirement of computer-aided translation mechanisms is performance, Concordia incorporates upgraded versions of standard approximate searching techniques, aiming at reducing the computational complexity.||

||<style="border:0;padding-top:5px;padding-bottom:5px">'''21 November 2016'''||
||<style="border:0;padding-left:30px;padding-bottom:0px">'''Norbert Ryciak, Aleksander Wawer''' (Institute of Computer Science, Polish Academy of Sciences)||
||<style="border:0;padding-left:30px;padding-bottom:5px">[[https://www.youtube.com/watch?v=hGKzZxFa0ik|{{attachment:seminarium-archiwum/youtube.png}}]] '''[[attachment:seminarium-archiwum/2016-11-21.pdf|Using recursive deep neural networks and syntax to compute phrase semantics]]''' &#160;{{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">The seminar presents initial experiments on recursive phrase-level sentiment computation using dependency syntax and deep learning. We discuss neural network architectures and implementations created within Clarin 2 and present results on English language resources. Seminar also covers undergoing work on Polish language resources.||

||<style="border:0;padding-top:5px;padding-bottom:5px">'''5 December 2017'''||
||<style="border:0;padding-left:30px;padding-bottom:0px">'''Dominika Rogozińska''', '''Marcin Woliński''' (Institute of Computer Science, Polish Academy of Sciences)||
||<style="border:0;padding-left:30px;padding-bottom:5px">'''[[attachment:seminarium-archiwum/2016-12-05.pdf|Methods of syntax disambiguation for constituent parse trees in Polish as post–processing phase of the Świgra parser]]''' &#160;{{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">The presentation shows methods of syntax disambiguation for Polish utterances produced by the Świgra parser. Presented methods include probabilistic context free grammars and maximum entropy models. The best of described models achieves efficiency measure at the level of 96.2%. The outcome of our experiments is a module for post-processing Świgra's parses.||

||<style="border:0;padding-top:5px;padding-bottom:5px">'''9 January 2017'''||
||<style="border:0;padding-left:30px;padding-bottom:0px">'''Agnieszka Pluwak''' (Institute of Slavic Studies, Polish Academy of Sciences)||
||<style="border:0;padding-left:30px;padding-bottom:5px">''' [[attachment:seminarium-archiwum/2017-01-09.pdf|Building a domain-specific knowledge representation using an extended method of frame semantics on a corpus of Polish, English and German lease agreements]]''' &#160;{{attachment:seminarium-archiwum/icon-pl.gif|Wystąpienie w języku polskim.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">The !FrameNet project is defined by its authors as a lexical base with some ontological features (not an ontology sensu stricto, however, due to a selective approach towards description of frames and lexical units, as well as frame-to-frame relations). Ontologies, as knowledge representations in the field of NLP, should have the capacity of implementation to specific domains and texts, however, in the !FrameNet bibliography published before January 2016 I haven’t found a single knowledge representation based entirely on frames or on an extensive structure of frame-to-frame relations. I did find a few examples of domain-specific knowledge representations with the use of selected !FrameNet frames, such as !BioFrameNet or Legal !FrameNet, where frames were applied to connect data from different sources. Therefore, in my dissertation, I decided to conduct an experiment and build a knowledge representation of frame-to-frame relations for the domain of lease agreements. The aim of my study was the description of frames useful in case of building a possible data extraction system from lease agreements, this is frames containing answers to questions asked by a professional analyst while reading lease agreements. In my work I have asked several questions, e.g. would I be able to use !FrameNet frames for this purpose or would I have to build my own frames? Will the analysis of Polish cause language-specific problems? How will the professional language affect the use of frames in context? Etc.||

||<style="border:0;padding-top:5px;padding-bottom:5px">'''23 January 2017'''||
||<style="border:0;padding-left:30px;padding-bottom:0px">'''Marek Rogalski''' (Lodz University of Technology)||
||<style="border:0;padding-left:30px;padding-bottom:5px">'''Automatic paraphrasing''' &#160;{{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">Paraphrasing is conveying the essential meaning of a message using different words. The ability to paraphrase is a measure of understanding. A teacher asking student a question "could you please tell us using your own words ...", tests whether the student has understood the topic. On this presentation we will discuss the task of automatic paraphrasing. We will differentiate between syntax-level paraphrases and essential-meaning-level paraphrases. We will bring up several techniques from seemingly unrelated fields that can be applied in automatic paraphrasing. We will also show results that we've been able to produce with those techniques.||

||<style="border:0;padding-top:5px;padding-bottom:5px">'''6 February 2017'''||
||<style="border:0;padding-left:30px;padding-bottom:0px">'''Łukasz Kobyliński''' (Institute of Computer Science, Polish Academy of Sciences)||
||<style="border:0;padding-left:30px;padding-bottom:5px">[[https://www.youtube.com/watch?v=TP9pmPKla1k|{{attachment:seminarium-archiwum/youtube.png}}]] '''Korpusomat''' &#160;{{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">[[http://korpusomat.nlp.ipipan.waw.pl/|Korpusomat]] is a web tool facilitating unassisted creation of corpora for linguistic studies. After sending a set of text files they are automatically morphologically analysed and lemmatised using Morfeusz and disambiguated using Concraft tagger. The resulting corpus can be then downloaded and analysed offline using Poliqarp search engine to query for information related to text segmentation, base forms, inflectional interpretations and (dis)ambiguities. Poliqarp is also capable of calculating frequencies and applying basic statistical measures necessary for quantitative analysis. Apart from plain text files Korpusomat can also process more complex textual formats such as popular EPUBs, download source data from the Internet, strip unnecessary information and extract document metadata.||

||<style="border:0;padding-top:5px;padding-bottom:5px">'''20 February 2017''' (NOTE: the talk was delivered at [[https://ipipan.waw.pl/en/institute/scientific-activities/seminars/institute-seminar|the Institute seminar]])||
||<style="border:0;padding-left:30px;padding-bottom:0px">'''Elżbieta Hajnicz''' (Institute of Computer Science, Polish Academy of Sciences)||
||<style="border:0;padding-left:30px;padding-bottom:5px">[[https://youtu.be/lDKQ9jhIays|{{attachment:seminarium-archiwum/youtube.png}}]] '''Representation language of the valency dictionary Walenty''' &#160;{{attachment:seminarium-archiwum/icon-pl.gif|The talk delivered in Polish.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">The Polish Valence Dictionary (Walenty) is intended to be used by natural language processing tools, particularly parsers, and thus it offers formalized representation of te valency information. The talk presented the notion of valency and its representation in the dictionary along with examples illustrating how particular syntactic and semantic language phenomena are modelled.||

||<style="border:0;padding-top:5px;padding-bottom:5px">'''2 March 2017''' (NOTE: the seminar will be held on Thursday, 10:15 am)||
||<style="border:0;padding-left:30px;padding-bottom:0px">'''Wojciech Jaworski''' (University of Warsaw)||
||<style="border:0;padding-left:30px;padding-bottom:5px">'''Integration of dependency parser with a categorial parser''' &#160;{{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">As part of the talk I will describe the division of texts into sentences and controlling the execution of each parser within the emerging hybrid parser in the Clarin-bis project. I will describe the adopted method of dependency structure conversion aimed to make them compatible with the structures of categorial parser. The conversion will have two aspects: changing the attributes of each node and changing the links between nodes. I will depict how the method used can be extended to convert compressed forests generated by the parser Świgra. At the end I wil talk about the plans and the goals of reimplementation of the !MateParser algorithm.||

||<style="border:0;padding-top:5px;padding-bottom:5px">'''13 March 2017'''||
||<style="border:0;padding-left:30px;padding-bottom:0px">'''Marek Kozłowski''' (National Information Processing Institute)||
||<style="border:0;padding-left:30px;padding-bottom:5px">'''Internet model of Polish and semantic text processing''' &#160;{{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}||

||<style="border:0;padding-left:30px;padding-bottom:15px">The presentation shows how [[http://babelnet.org/|BabelNet]] (the multilingual encyclopaedia and semantic network based on publicly available data sources such as Wikipedia and !WordNet), can be used in the task of grouping short texts, sentiment analysis or emotional profiling of movies based on their subtitles. The second part presents the work based on [[http://commoncrawl.org/|CommonCrawl]] – publicly available petabyte-size open repository of multilingual Web pages. !CommonCrawl was used to build two models of Polish: n-gram-based and semantic distribution-based.||

||<style="border:0;padding-top:5px;padding-bottom:5px">'''20 March 2017'''||
||<style="border:0;padding-left:30px;padding-bottom:0px">'''Jakub Szymanik'''||
||<style="border:0;padding-left:30px;padding-bottom:5px">Exploring the Relation of Semantic Complexity and Quantifier Distribution in Large Corpora. &#160;{{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">In this joint paper with Camilo Thorne, we study whether semantic complexity influences the distribution of generalized quantifiers in a large English corpus derived from Wikipedia. We consider the minimal computational device recognizing a generalized quantifier as the core measure of its semantic complexity. We regard quantifiers that belong to three increasingly more complex classes: Aristotelian (recognizable by 2-state acyclic finite automata), counting (k+2-state finite automata), and proportional quantifiers (pushdown automata). Using regression analysis we show that semantic complexity is a statistically significant factor explaining 27.29% of frequency variation. We compare this impact to that of other known sources of complexity, both semantic (quantifier monotonicity and the comparative/superlative distinction) and superficial (e.g., the length of quantifier surface forms). In general, we observe that the more complex a quantifier, the less frequent it is.||

||<style="border:0;padding-top:5px;padding-bottom:5px">'''27 March 2017'''||
||<style="border:0;padding-left:30px;padding-bottom:0px">'''Paweł Morawiecki''' (Institute of Computer Science, Polish Academy of Sciences)||
||<style="border:0;padding-left:30px;padding-bottom:5px">'''Introduction to deep neural networks''' &#160;{{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">In the last few years, Deep Neural Networks (DNN) has become a tool that provides the best solution for many problems from image and speech recognition. Also in natural language processing DNN totally revolutionizes the way how translation or word representation is done (and for many other problems). This presentation aims to provide good intuitions related to the DNN, their core architectures and how they operate. I will discuss and suggest the tools and source materials that can help in the further exploration of the topic and independent experiments.||

||<style="border:0;padding-top:5px;padding-bottom:5px">'''10 April 2017'''||
||<style="border:0;padding-left:30px;padding-bottom:0px">'''Paweł Morawiecki''' (Institute of Computer Science, Polish Academy of Sciences)||
||<style="border:0;padding-left:30px;padding-bottom:5px">Talk title will be available shortly. &#160;{{attachment:seminarium-archiwum/icon-pl.gif|Talk delivered in Polish.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">The summary will be available shortly.||

||<style="border:0;padding-top:10px">Please see also [[http://nlp.ipipan.waw.pl/NLP-SEMINAR/previous-e.html|the talks given between 2000 and 2015]] and [[http://zil.ipipan.waw.pl/seminar-archive|2015-16]].||

Natural Language Processing Seminar 2016–2017

The NLP Seminar is organised by the Linguistic Engineering Group at the Institute of Computer Science, Polish Academy of Sciences (ICS PAS). It takes place on (some) Mondays, normally at 10:15 am, in the seminar room of the ICS PAS (ul. Jana Kazimierza 5, Warszawa).

seminarium

10 October 2016

Katarzyna Pakulska, Barbara Rychalska, Krystyna Chodorowska, Wojciech Walczak, Piotr Andruszkiewicz (Samsung)

Paraphrase Detection Ensemble – SemEval 2016 winner  Talk delivered in Polish.

This seminar describes the winning solution designed for a core track within the SemEval 2016 English Semantic Textual Similarity (STS) task. The goal of the competition was to measure semantic similarity between two given sentences on a scale from 0 to 5. At the same time the solution should replicate human language understanding. The presented model is a novel hybrid of recursive auto-encoders from deep learning (RAE) and a WordNet award-penalty system, enriched with a number of other similarity models and features used as input for Linear Support Vector Regression.

24 October 2016

Adam Przepiórkowski, Jakub Kozakoszczak, Jan Winkowski, Daniel Ziembicki, Tadeusz Teleżyński (Institute of Computer Science, Polish Academy of Sciences / University of Warsaw)

Corpus of formalized textual entailment steps  Talk delivered in Polish.

The authors present resources created within CLARIN project aiming to help with qualitative evaluation of RTE systems: two textual derivations corpora and a corpus of textual entailment rules. Textual derivation is a series of atomic steps which connects Text with Hypothesis in a textual entailment pair. Original pairs are taken from the FraCaS corpus and a polish translation of the RTE3 corpus. Textual entailment rule sanctions textual entailment relation between the input and the output of a step, using syntactic patterns written in the UD standard and some other semantic, logical and contextual constraints expressed in FOL.

7 November 2016

Rafał Jaworski (Adam Mickiewicz University in Poznań)

Concordia – translation memory search algorithm  Talk delivered in Polish.

The talk covers the Concordia algorithm (http://tmconcordia.sourceforge.net/), which is used to maximize the productivity of a human translator. The algorithm combines the features of standard fuzzy translation memory searching with a concordancer. As the key non-functional requirement of computer-aided translation mechanisms is performance, Concordia incorporates upgraded versions of standard approximate searching techniques, aiming at reducing the computational complexity.

21 November 2016

Norbert Ryciak, Aleksander Wawer (Institute of Computer Science, Polish Academy of Sciences)

https://www.youtube.com/watch?v=hGKzZxFa0ik Using recursive deep neural networks and syntax to compute phrase semantics  Talk delivered in Polish.

The seminar presents initial experiments on recursive phrase-level sentiment computation using dependency syntax and deep learning. We discuss neural network architectures and implementations created within Clarin 2 and present results on English language resources. Seminar also covers undergoing work on Polish language resources.

5 December 2017

Dominika Rogozińska, Marcin Woliński (Institute of Computer Science, Polish Academy of Sciences)

Methods of syntax disambiguation for constituent parse trees in Polish as post–processing phase of the Świgra parser  Talk delivered in Polish.

The presentation shows methods of syntax disambiguation for Polish utterances produced by the Świgra parser. Presented methods include probabilistic context free grammars and maximum entropy models. The best of described models achieves efficiency measure at the level of 96.2%. The outcome of our experiments is a module for post-processing Świgra's parses.

9 January 2017

Agnieszka Pluwak (Institute of Slavic Studies, Polish Academy of Sciences)

Building a domain-specific knowledge representation using an extended method of frame semantics on a corpus of Polish, English and German lease agreements  Wystąpienie w języku polskim.

The FrameNet project is defined by its authors as a lexical base with some ontological features (not an ontology sensu stricto, however, due to a selective approach towards description of frames and lexical units, as well as frame-to-frame relations). Ontologies, as knowledge representations in the field of NLP, should have the capacity of implementation to specific domains and texts, however, in the FrameNet bibliography published before January 2016 I haven’t found a single knowledge representation based entirely on frames or on an extensive structure of frame-to-frame relations. I did find a few examples of domain-specific knowledge representations with the use of selected FrameNet frames, such as BioFrameNet or Legal FrameNet, where frames were applied to connect data from different sources. Therefore, in my dissertation, I decided to conduct an experiment and build a knowledge representation of frame-to-frame relations for the domain of lease agreements. The aim of my study was the description of frames useful in case of building a possible data extraction system from lease agreements, this is frames containing answers to questions asked by a professional analyst while reading lease agreements. In my work I have asked several questions, e.g. would I be able to use FrameNet frames for this purpose or would I have to build my own frames? Will the analysis of Polish cause language-specific problems? How will the professional language affect the use of frames in context? Etc.

23 January 2017

Marek Rogalski (Lodz University of Technology)

Automatic paraphrasing  Talk delivered in Polish.

Paraphrasing is conveying the essential meaning of a message using different words. The ability to paraphrase is a measure of understanding. A teacher asking student a question "could you please tell us using your own words ...", tests whether the student has understood the topic. On this presentation we will discuss the task of automatic paraphrasing. We will differentiate between syntax-level paraphrases and essential-meaning-level paraphrases. We will bring up several techniques from seemingly unrelated fields that can be applied in automatic paraphrasing. We will also show results that we've been able to produce with those techniques.

6 February 2017

Łukasz Kobyliński (Institute of Computer Science, Polish Academy of Sciences)

https://www.youtube.com/watch?v=TP9pmPKla1k Korpusomat  Talk delivered in Polish.

Korpusomat is a web tool facilitating unassisted creation of corpora for linguistic studies. After sending a set of text files they are automatically morphologically analysed and lemmatised using Morfeusz and disambiguated using Concraft tagger. The resulting corpus can be then downloaded and analysed offline using Poliqarp search engine to query for information related to text segmentation, base forms, inflectional interpretations and (dis)ambiguities. Poliqarp is also capable of calculating frequencies and applying basic statistical measures necessary for quantitative analysis. Apart from plain text files Korpusomat can also process more complex textual formats such as popular EPUBs, download source data from the Internet, strip unnecessary information and extract document metadata.

20 February 2017 (NOTE: the talk was delivered at the Institute seminar)

Elżbieta Hajnicz (Institute of Computer Science, Polish Academy of Sciences)

https://youtu.be/lDKQ9jhIays Representation language of the valency dictionary Walenty  The talk delivered in Polish.

The Polish Valence Dictionary (Walenty) is intended to be used by natural language processing tools, particularly parsers, and thus it offers formalized representation of te valency information. The talk presented the notion of valency and its representation in the dictionary along with examples illustrating how particular syntactic and semantic language phenomena are modelled.

2 March 2017 (NOTE: the seminar will be held on Thursday, 10:15 am)

Wojciech Jaworski (University of Warsaw)

Integration of dependency parser with a categorial parser  Talk delivered in Polish.

As part of the talk I will describe the division of texts into sentences and controlling the execution of each parser within the emerging hybrid parser in the Clarin-bis project. I will describe the adopted method of dependency structure conversion aimed to make them compatible with the structures of categorial parser. The conversion will have two aspects: changing the attributes of each node and changing the links between nodes. I will depict how the method used can be extended to convert compressed forests generated by the parser Świgra. At the end I wil talk about the plans and the goals of reimplementation of the MateParser algorithm.

13 March 2017

Marek Kozłowski (National Information Processing Institute)

Internet model of Polish and semantic text processing  Talk delivered in Polish.

The presentation shows how BabelNet (the multilingual encyclopaedia and semantic network based on publicly available data sources such as Wikipedia and WordNet), can be used in the task of grouping short texts, sentiment analysis or emotional profiling of movies based on their subtitles. The second part presents the work based on CommonCrawl – publicly available petabyte-size open repository of multilingual Web pages. CommonCrawl was used to build two models of Polish: n-gram-based and semantic distribution-based.

20 March 2017

Jakub Szymanik

Exploring the Relation of Semantic Complexity and Quantifier Distribution in Large Corpora.  Talk delivered in Polish.

In this joint paper with Camilo Thorne, we study whether semantic complexity influences the distribution of generalized quantifiers in a large English corpus derived from Wikipedia. We consider the minimal computational device recognizing a generalized quantifier as the core measure of its semantic complexity. We regard quantifiers that belong to three increasingly more complex classes: Aristotelian (recognizable by 2-state acyclic finite automata), counting (k+2-state finite automata), and proportional quantifiers (pushdown automata). Using regression analysis we show that semantic complexity is a statistically significant factor explaining 27.29% of frequency variation. We compare this impact to that of other known sources of complexity, both semantic (quantifier monotonicity and the comparative/superlative distinction) and superficial (e.g., the length of quantifier surface forms). In general, we observe that the more complex a quantifier, the less frequent it is.

27 March 2017

Paweł Morawiecki (Institute of Computer Science, Polish Academy of Sciences)

Introduction to deep neural networks  Talk delivered in Polish.

In the last few years, Deep Neural Networks (DNN) has become a tool that provides the best solution for many problems from image and speech recognition. Also in natural language processing DNN totally revolutionizes the way how translation or word representation is done (and for many other problems). This presentation aims to provide good intuitions related to the DNN, their core architectures and how they operate. I will discuss and suggest the tools and source materials that can help in the further exploration of the topic and independent experiments.

10 April 2017

Paweł Morawiecki (Institute of Computer Science, Polish Academy of Sciences)

Talk title will be available shortly.  Talk delivered in Polish.

The summary will be available shortly.

Please see also the talks given between 2000 and 2015 and 2015-16.