Differences between revisions 47 and 50 (spanning 3 versions)

Natural Language Processing Seminar 2015–2016

The NLP Seminar is organised by the Linguistic Engineering Group at the Institute of Computer Science, Polish Academy of Sciences (ICS PAS). It takes place on (some) Mondays, normally at 10:15 am, in the seminar room of the ICS PAS (ul. Jana Kazimierza 5, Warszawa).

12 October 2015

Vincent Ng (University of Texas at Dallas)

Beyond OntoNotes Coreference

Recent years have seen considerable progress on the notoriously difficult task of coreference resolution owing in part to the availability of coreference-annotated corpora such as MUC, ACE, and OntoNotes. Coreference, however, is more than MUC/ACE/OntoNotes coreference: it encompasses many interesting cases of anaphora that are not covered in the extensively investigated MUC/ACE/OntoNotes entity coreference task. This talk examined several comparatively less-studied coreference tasks that were arguably no less challenging than the MUC/ACE/OntoNotes entity coreference task, including the Winograd Schema Challenge, zero anaphora resolution, and event coreference resolution.

26 October 2015

Wojciech Jaworski (University of Warsaw)

Syntactic-semantic parser for Polish

The author presented the parser being developed within CLARIN-PL project, its morphological pre-processing, a categorial grammar of Polish integrated with valency dictionary and used by the parser and the semantic graph formalism used for meaning representation. He also discussed algorithms used by the parser and optimization strategies, both related to performance and concise representation of ambiguous syntactic and semantic parsing trees.

16 November 2015

Izabela Gatkowska (Jagiellonian University in Kraków)

The Empirical Network of Lexical Links

The empirical network of lexical links is the result of an experiment using a human associative mechanism – the person who is the subject of the research says the test first word that comes to his mind after understanding the stimulus word. The study was conducted in a cyclical manner, i.e. response words obtained in the first cycle were used as stimuli in the second cycle, which enabled the creation of a semantic network, which differs from the network created with the bodies of a text, for example, WORTSCHATZ and a network constructed by hand, for example. WordNet. The empirically obtained words, which are derived from those words in the network, have a direction and power connections. The set of incoming and outgoing connections, in which is found a specific expression, creates a lexical node network (subnet). The manner in which the network characterizes meaning, is shown in the example of feedback connections which are a specific example of the dependencies which appear between two words, appearing in the lexical node. A qualitative analysis of the semantic lexical relations known in linguistics, and employed for example in the WordNet dictionary, permit an interpretation of only approximately 25% of linkage feedback. The remaining links may be interpreted by referring to the model of the description of the significance as proposed in the FrameNet dictionary. A qualitative interpretation of all the links found in the lexical node may permit a study of the comparative lexical network nodes experimentally constructed for different natural languages, and may also allow, a separation of empirical semantic models employed by the same set of links found between nodes in a given network.

30 November 2015

Dora Montagna (Universidad Autónoma de Madrid)

Semantic representation of a polysemous verb in Spanish

The author presented a theoretical model of representation of meaning, based on Pustejovsky's theory of the Generative Lexicon. The proposal is intended as a base for automatic disambiguation, but also as a new model of lexicographic description. The model will be applied to a highly productive verb in Spanish, assuming the hypothesis of verbal underspecification in order to establish patterns of semantic behaviors.

7 December 2015

Łukasz Kobyliński (Institute of Computer Science, Polish Academy of Sciences), Witold Kieraś (University of Warsaw)

Morphosyntactic tagging of Polish – state of the art and future perspectives

During the presentation, the state of the art in the area of automatic approaches to morphosyntactic tagging of Polish language text was discussed, with a particular focus on the analysis of performance of publicly available tools, which are possible to use in real applications. A qualitative and quantitative analysis of the errors made by the taggers was conducted, along with a discussion on the possible causes and solutions to these problems. Tagging results for Polish was compared and contrasted with the results for other European languages.

8 December 2015

Salvador Pons Bordería (Universitat de València)

Discourse Markers from a pragmatic perspective: The role of discourse units in defining functions

One of the most disregarded aspects in the description of discourse markers is position. Notions such as "initial position" or "final position" are meaningless unless it can be specified with regard to what a DM is "initial" or "final". The presentation defended the idea that, for this question to be answered, appeal must be made to the notion of "discourse unit". Provided with a set of a) discourse units, and b) discourse positions, determining the function of a given DM is quasi-automatic.

11 January 2016

Małgorzata Marciniak, Agnieszka Mykowiecka, Piotr Rychlik (Institute of Computer Science, Polish Academy of Sciences)

Terminology extraction from Polish data – program TermoPL

The presentation addressed the problems of terminology extraction from Polish domain corpora. The authors described the C-value method to rank term candidates based on frequency measure and number of term contexts. The method takes into account nested terms that may not appear by themselves in data. Using this method, several nested grammatical subphrases are obtained which are syntactically correct, but semantically odd, like 'USG jamy' `USG of cavity’. The recognition of nested terms is supported by word connection strength which allows to eliminate truncated phrases from the top part of the term list. The talk was completed by the demo of the TermoPL tool.

25 January 2015

Wojciech Jaworski (University of Warsaw)

Syntactic-semantic parser for Polish: integration with lexical resources, parsing

During the lecture the author will talk about the integration of syntactic-semantic with SGJP, Polimorf, Słowosieć and Walenty. He will present preliminary observations concerning the impact that checking semantic preferences has on parsing. He will also describe a categorical formalism used to parse and present briefly how the parser works.

22 February 2016

Witold Dyrka (Wrocław University of Technology) – NOTE: the talk will start at 11:00.

Language(s) of proteins? - premises, contributions and perspectives

In his speech the author will present arguments in favour of treating protein sequences, or higher protein structures, as sentences in some language(s). Then he plans to show several interesting results (my own and others') of application of quantitative methods of text analysis, and formal linguistics tools (such as probabilistic context-free grammars) for the analysis of proteins. Eventually, he will present plans of his further work on the "protein linguistics", which - as he hopes - will inspire an interesting discussion.

7 March 2016

Zbigniew Bronk (Grammatical Dictionary of Polish team member)

JOD – a markup language for Polish declension.

JOD, a markup language for Polish declension, had been constructed in order to precisely describe inflectional rules and schemes for nouns and adjectives in Polish. Its first application was the description of inflection of surnames, taking into account the sex of the person or persons using the given surname. This model has been the basis for the "Automaton of declension of Polish surnames." The author will present the general idea of the language and the implementation of its interpreter, as well as the JOD editor and the website "Automaton of declension of Polish surnames".

See the talks given between 2000 and 2015.

-  ⇤ ← Revision 47 as of 2015-12-21 14:29:38 → 
  Size: 11230
  Editor: MaciejOgrodniczuk
  Comment:
+   ← Revision 50 as of 2016-01-17 20:36:01 → ⇥
  Size: 12083
  Editor: MaciejOgrodniczuk
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 9:
-||<style="border:0;padding-left:30px;padding-bottom:15px">Recent years have seen considerable progress on the notoriously difficult task of coreference resolution owing in part to the availability of coreference-annotated corpora such as MUC, ACE, and !OntoNotes. Coreference, however, is more than MUC/ACE/OntoNotes coreference: it encompasses many interesting cases of anaphora that are not covered in the extensively investigated MUC/ACE/OntoNotes entity coreference task. This talk examines several comparatively less-studied coreference tasks that are arguably no less challenging than the MUC/ACE/OntoNotes entity coreference task, including the Winograd Schema Challenge, zero anaphora resolution, and event coreference resolution.||
+||<style="border:0;padding-left:30px;padding-bottom:15px">Recent years have seen considerable progress on the notoriously difficult task of coreference resolution owing in part to the availability of coreference-annotated corpora such as MUC, ACE, and !OntoNotes. Coreference, however, is more than MUC/ACE/OntoNotes coreference: it encompasses many interesting cases of anaphora that are not covered in the extensively investigated MUC/ACE/OntoNotes entity coreference task. This talk examined several comparatively less-studied coreference tasks that were arguably no less challenging than the MUC/ACE/OntoNotes entity coreference task, including the Winograd Schema Challenge, zero anaphora resolution, and event coreference resolution.||
 Line 14:
-||<style="border:0;padding-left:30px;padding-bottom:15px">The author will present the parser being developed within CLARIN-PL project, its morphological pre-processing, a categorial grammar of Polish integrated with valency dictionary and used by the parser and the semantic graph formalism used for meaning representation. It will also discuss algorithms used by the parser and optimization strategies, both related to performance and concise representation of ambiguous syntactic and semantic parsing trees.||
+||<style="border:0;padding-left:30px;padding-bottom:15px">The author presented the parser being developed within CLARIN-PL project, its morphological pre-processing, a categorial grammar of Polish integrated with valency dictionary and used by the parser and the semantic graph formalism used for meaning representation. He also discussed algorithms used by the parser and optimization strategies, both related to performance and concise representation of ambiguous syntactic and semantic parsing trees.||
 Line 19:
-||<style="border:0;padding-left:30px;padding-bottom:15px">The empirical network of lexical links is the result of an experiment using a human associative mechanism – the person who is the subject of the research says the test first word that comes to his mind after understanding the stimulus word. The study was conducted in a cyclical manner, i.e. response words obtained in the first cycle were used as stimuli in the second cycle, which enabled the creation of a semantic network, which differs from the network created with the bodies of a text, for example, WORTSCHATZ and a network constructed by hand, for example. !WordNet. The empirically obtained words, which are derived from those words in the network, have a direction and power connections.  The set of incoming and outgoing connections, in which is found a specific expression, creates a lexical node network (subnet). The manner in which the network characterizes meaning, is  shown in the example of feedback connections which are a specific example of the dependencies which appear between two words, appearing in the lexical node. A qualitative analysis of the semantic lexical relations known in linguistics, and employed for example in the !WordNet dictionary, permit an interpretation of only approximately 25% of linkage feedback.  The remaining links may  be interpreted by referring to the model of the description of the significance as proposed in the !FrameNet dictionary. A qualitative interpretation of all the links found in the lexical node may permit a study of the comparative lexical network nodes experimentally constructed for different natural languages, and may also allow, a separation of empirical semantic models employed by the same set of links found between nodes in a given network.||
+||<style="border:0;padding-left:30px;padding-bottom:15px">The empirical network of lexical links is the result of an experiment using a human associative mechanism – the person who is the subject of the research says the test first word that comes to his mind after understanding the stimulus word. The study was conducted in a cyclical manner, i.e. response words obtained in the first cycle were used as stimuli in the second cycle, which enabled the creation of a semantic network, which differs from the network created with the bodies of a text, for example, WORTSCHATZ and a network constructed by hand, for example. !WordNet. The empirically obtained words, which are derived from those words in the network, have a direction and power connections.  The set of incoming and outgoing connections, in which is found a specific expression, creates a lexical node network (subnet). The manner in which the network characterizes meaning, is shown in the example of feedback connections which are a specific example of the dependencies which appear between two words, appearing in the lexical node. A qualitative analysis of the semantic lexical relations known in linguistics, and employed for example in the !WordNet dictionary, permit an interpretation of only approximately 25% of linkage feedback.  The remaining links may  be interpreted by referring to the model of the description of the significance as proposed in the !FrameNet dictionary. A qualitative interpretation of all the links found in the lexical node may permit a study of the comparative lexical network nodes experimentally constructed for different natural languages, and may also allow, a separation of empirical semantic models employed by the same set of links found between nodes in a given network.||
 Line 24:
-||<style="border:0;padding-left:30px;padding-bottom:15px">The author will present a theoretical model of representation of meaning, based on Pustejovsky's theory of the Generative Lexicon. The proposal is intended as a base for automatic disambiguation, but also as a new model of lexicographic description. The model will be applied to a highly productive verb in Spanish, assuming the hypothesis of verbal underspecification in order to establish patterns of semantic behaviors.||
+||<style="border:0;padding-left:30px;padding-bottom:15px">The author presented a theoretical model of representation of meaning, based on Pustejovsky's theory of the Generative Lexicon. The proposal is intended as a base for automatic disambiguation, but also as a new model of lexicographic description. The model will be applied to a highly productive verb in Spanish, assuming the hypothesis of verbal underspecification in order to establish patterns of semantic behaviors.||
 Line 29:
-||<style="border:0;padding-left:30px;padding-bottom:15px">During the presentation, the state of the art in the area of automatic approaches to morphosyntactic tagging of Polish language text will be discussed, with a particular focus on the analysis of performance of publicly available tools, which are possible to use in real applications. A qualitative and quantitative analysis of the errors made by the taggers will be conducted, along with a discussion on the possible causes and solutions to these problems. Tagging results for Polish will be compared and contrasted with the results for other European languages.||
+||<style="border:0;padding-left:30px;padding-bottom:15px">During the presentation, the state of the art in the area of automatic approaches to morphosyntactic tagging of Polish language text was discussed, with a particular focus on the analysis of performance of publicly available tools, which are possible to use in real applications. A qualitative and quantitative analysis of the errors made by the taggers was conducted, along with a discussion on the possible causes and solutions to these problems. Tagging results for Polish was compared and contrasted with the results for other European languages.||
 Line 34:
-||<style="border:0;padding-left:30px;padding-bottom:15px">One of the most disregarded aspects in the description of discourse markers is position. Notions such as "initial position" or "final position" are meaningless unless it can be specified with regard to what a DM is "initial" or "final". This presentation will defend the idea that, for this question to be answered, appeal must be made to the notion of "discourse unit". Provided with a set of a) discourse units, and b) discourse positions, determining the function of a given DM is quasi-automatic.||
+||<style="border:0;padding-left:30px;padding-bottom:15px">One of the most disregarded aspects in the description of discourse markers is position. Notions such as "initial position" or "final position" are meaningless unless it can be specified with regard to what a DM is "initial" or "final". The presentation defended the idea that, for this question to be answered, appeal must be made to the notion of "discourse unit". Provided with a set of a) discourse units, and b) discourse positions, determining the function of a given DM is quasi-automatic.||
 Line 37:
-||<style="border:0;padding-left:30px;padding-bottom:0px">'''Małgorzata Marciniak, Agnieszka Mykowiecka, Piotr Rychlik''' (Institute of Computer Science, Polish Academy of Sciences) – '''NOTE: the talk will start at 13:00.'''||
||<style="border:0;padding-left:30px;padding-bottom:5px">'''Terminology extraction from Polish data – program TermoPL''' &#160;{{attachment:icon-pl.gif|The talk delivered in Polish.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">Our presentation  addresses the problems of  terminology extraction from Polish domain corpora. We describe the C-value method to rank term candidates, which is based on frequency measure and number of term contexts. The method takes into account nested terms that may not appear by themselves in data. Using this method, we obtain several nested grammatical subphrases which are syntactically correct, but semantically odd, like USG jamy `USG of cavity’. We support the recognition of nested terms by word connection strength which allows us to eliminate truncated phrases from the top part of the term list. The talk is completed by the demo of the TermoPL tool.||
+||<style="border:0;padding-left:30px;padding-bottom:0px">'''Małgorzata Marciniak, Agnieszka Mykowiecka, Piotr Rychlik''' (Institute of Computer Science, Polish Academy of Sciences)||
||<style="border:0;padding-left:30px;padding-bottom:5px">'''[[attachment:seminarium/2016-01-11.pdf|Terminology extraction from Polish data – program TermoPL]]''' &#160;{{attachment:icon-pl.gif|The talk delivered in Polish.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">The presentation addressed the problems of terminology extraction from Polish domain corpora. The authors described the C-value method to rank term candidates based on frequency measure and number of term contexts. The method takes into account nested terms that may not appear by themselves in data. Using this method, several nested grammatical subphrases are obtained which are syntactically correct, but semantically odd, like 'USG jamy' `USG of cavity’. The recognition of nested terms is supported by word connection strength which allows to eliminate truncated phrases from the top part of the term list. The talk was completed by the demo of the TermoPL tool.||
 Line 43:
-||<style="border:0;padding-left:30px;padding-bottom:5px">Title of the talk will be available shortly. &#160;{{attachment:icon-pl.gif|The talk delivered in Polish.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">Summary will be available shortly.||
+||<style="border:0;padding-left:30px;padding-bottom:5px">'''Syntactic-semantic parser for Polish: integration with lexical resources, parsing''' &#160;{{attachment:icon-pl.gif|The talk delivered in Polish.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">During the lecture the author will talk about the integration of syntactic-semantic with SGJP, Polimorf, Słowosieć and Walenty. He will present preliminary observations concerning the impact that checking semantic preferences has on parsing. He will also describe a categorical formalism used to parse and present briefly how the parser works.||
 Line 48:
-||<style="border:0;padding-left:30px;padding-bottom:5px">Title of the talk will be available shortly. &#160;{{attachment:icon-pl.gif|The talk delivered in Polish.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">Summary will be available shortly.||
+||<style="border:0;padding-left:30px;padding-bottom:5px">'''Language(s) of proteins? - premises, contributions and perspectives''' &#160;{{attachment:icon-pl.gif|The talk delivered in Polish.}}||
||<style="border:0;padding-left:30px;padding-bottom:15px">In his speech the author will present arguments in favour of treating protein sequences, or higher protein structures, as sentences in some language(s). Then he plans to show several interesting results (my own and others') of application of quantitative methods of text analysis, and formal linguistics tools (such as probabilistic context-free grammars) for the analysis of proteins. Eventually, he will present plans of his further work on the "protein linguistics", which - as he hopes - will inspire an interesting discussion.||
 Line 51:
-||<style="border:0;padding-top:5px;padding-bottom:5px">'''7 marca 2016'''||
+||<style="border:0;padding-top:5px;padding-bottom:5px">'''7 March 2016'''||

Diff for "seminar-archive"

Menu

Natural Language Processing Seminar 2015–2016