Locked History Actions

Diff for "PDB/PDBparser"

Differences between revisions 13 and 36 (spanning 23 versions)
Revision 13 as of 2018-03-22 16:33:17
Size: 6756
Comment:
Revision 36 as of 2019-01-18 12:00:22
Size: 8610
Comment:
Deletions are marked like this. Additions are marked like this.
Line 2: Line 2:
= PDBparser = == PDB-based dependency parsing models for Polish ==
Line 4: Line 4:
PDBparser is a Polish dependency parser trained on the current version of ([[http://zil.ipipan.waw.pl/PDB|Polish Depedency Bank]]) with the publicly available parsing systems – [[http://maltparser.org|MaltParser]] or [[https://code.google.com/archive/p/mate-tools/|MateParser]]. `MaltParser` is a transition-based dependency parser that uses a deterministic parsing algorithm. The deterministic parsing algorithm builds a dependency structure of an input sentence based on transitions (shift-reduce actions) predicted by a classifier. The classifier learns to predict the next transition given training data and the parse history. `MateParser`, in turn, is a graph-based parser that defines a space of well-formed candidate dependency trees for an input sentence, scores them given an induced parsing model, and selects the highest scoring dependency tree as a correct analysis of the input sentence. The PDB-based models are trained on the current version of [[http://zil.ipipan.waw.pl/PDB|Polish Depedency Bank]] with the publicly available parsing systems – [[https://github.com/360er0/COMBO|COMBO]], [[https://code.google.com/archive/p/mate-tools/|MateParser]] and [[http://maltparser.org|MaltParser]]. /* ''MaltParser'' is a transition-based dependency parser that uses a deterministic parsing algorithm. The deterministic parsing algorithm builds a dependency structure of an input sentence based on transitions (shift-reduce actions) predicted by a classifier. The classifier learns to predict the next transition given training data and the parse history. `MateParser`, in turn, is a graph-based parser that defines a space of well-formed candidate dependency trees for an input sentence, scores them given an induced parsing model, and selects the highest scoring dependency tree as a correct analysis of the input sentence. */
Line 6: Line 6:
== PDB-based parsing models for Polish ==  * COMBO
  * [[attachment:190115_COMBO_PDB_nosem.pkl]] – PDB-based COMBO model for part-of-speech tagging, lemmatisation, and dependency parsing
  * [[attachment: 190115_COMBO_PDB_sem.pkl]] – PDB-based COMBO model for part-of-speech tagging, lemmatisation, dependency parsing and semantic role labelling
Line 8: Line 10:
=== MateParser ===  {{{#!wiki comment
 * '''NEW!''' PDB-based COMBO model compatible with the tagset of Morfeusz 2: [[attachment:180912_PDBCOMBO.pkl]]
Line 10: Line 13:
 * PDBMate model (compatible with the tagset of Morfeusz): [[attachment:170608_PDBMate.mdl]]
 * PDBMate model (compatible with the tagset of Morfeusz 2):
 * MateParser
Line 13: Line 15:
=== MaltParser ===
 * PDBMalt model (compatible with the tagset of Morfeusz): [[attachment:170608_PDBMalt.mco]]
 * PDBMalt model (compatible with the tagset of Morfeusz 2): [[attachment:180322_PDBMalt.mco]]
  * '''NEW!''' PDB-based Mate model compatible with the tagset of Morfeusz 2: [[attachment:180322_PDBMate.mdl]]
  * PDB-based Mate model compatible with the tagset of Morfeusz: [[attachment:170608_PDBMate.mdl]]}}}
Line 17: Line 18:
=== Semantic PDB models ===
 * Semantic PDBMate model:
 * Semantic PDBMalt model:
 * MateParser
  * [[attachment:]] – PDB-based MateParser model for dependency parsing
 * MaltParser
  * [[attachment:]] – PDB-based MaltParser model for dependency parsing

{{{#!wiki comment
  * '''NEW!''' PDB-based MaltParser model compatible with the tagset of Morfeusz 2: [[attachment:180322_PDBMalt.mco]]
  * PDB-basd MaltParser model compatible with the tagset of Morfeusz: [[attachment:170608_PDBMalt.mco]]}}}


== PDBUD-based dependency parsing models for Polish ==
The PDBUD-based models are trained on the current version of [[http://git.nlp.ipipan.waw.pl/alina/PDBUD|Polish Depedency Bank in Universal Dependencies format]] with the publicly available parsing systems – [[http://ufal.mff.cuni.cz/udpipe|UDPipe]] and [[https://github.com/360er0/COMBO|COMBO]].

 * COMBO
  * [[attachment: 190115_COMBO_PDBUD_nosem.pkl]] – PDBUD-based model COMBO for part-of-speech tagging, lemmatisation, and dependency parsing
 * UDPipe
  * tab

{{{#!wiki comment
 * [[http://mozart.ipipan.waw.pl/~prybak/model_poleval2018/model_A_semi.pkl|COMBO]] model for Polish (the model estimated for the [[http://poleval.pl/tasks#task1|PolEval 2018]] competition)
 * [[attachment:180606_PDBUDPipe.udpipe|UDPipe]] model for Polish}}}
Line 22: Line 41:

See [[http://clip.ipipan.waw.pl/benchmarks|Dependency parsing]] section.

{{{#!wiki comment
Line 59: Line 82:
}}}
Line 60: Line 84:
== Dependency parser integrated into Multiservice NLP for Polish == == PDB-based MaltParser in Multiservice ==
Line 66: Line 90:
<<BibMate(key, "wrob:14", omitYears=true)>> <<BibMate(key, "wro:14", omitYears=true)>>
Line 74: Line 98:
The dependency parsing models for Polish are released under the GNU General Public License v3 (GPL v.3) and by downloading it you accept the conditions of that licence. The dependency parsing models for Polish are released under the [[https://creativecommons.org/licenses/by-nc-sa/4.0/|CC BY-NC-SA 4.0]] licence and by downloading it you accept the conditions of that licence.

== Founding ==
The research was founded by SONATA 8 grant no 2014/15/D/HS2/03486 from the National Science Centre Poland and by the Polish Ministry of Science and Higher Education as part of the investment in the CLARIN-PL research infrastructure.

PDB-based dependency parsing models for Polish

The PDB-based models are trained on the current version of Polish Depedency Bank with the publicly available parsing systems – COMBO, MateParser and MaltParser.

PDBUD-based dependency parsing models for Polish

The PDBUD-based models are trained on the current version of Polish Depedency Bank in Universal Dependencies format with the publicly available parsing systems – UDPipe and COMBO.

Parsing performance

See Dependency parsing section.

PDB-based MaltParser in Multiservice

  • The performance of MaltParser model for Polish may be tested in Multiservice NLP – http://multiservice.nlp.ipipan.waw.pl.

  • To parse a Polish text in Multiservice "Select predefined chain of actions": 5: Concraft, DependencyParser, input your text, and press the button "Run".

  • To download the parser's output in CoNLL format, "Select output format:":

Publications

List of publications

Alina Wróblewska. Polish Dependency Parser Trained on an Automatically Induced Dependency Bank. Ph.D. dissertation, Institute of Computer Science, Polish Academy of Sciences, Warsaw, 2014.

List of publications

Alina Wróblewska and Adam Przepiórkowski. Projection-based annotation of a Polish dependency treebank. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC 2014, pages 2306–2312, Reykjavík, Iceland, 2014. European Language Resources Association (ELRA).

List of publications

Alina Wróblewska. Polish dependency bank. Linguistic Issues in Language Technology, 7(1), 2012.

List of publications

Licensing

The dependency parsing models for Polish are released under the CC BY-NC-SA 4.0 licence and by downloading it you accept the conditions of that licence.

Founding

The research was founded by SONATA 8 grant no 2014/15/D/HS2/03486 from the National Science Centre Poland and by the Polish Ministry of Science and Higher Education as part of the investment in the CLARIN-PL research infrastructure.

Contact

Any questions, comments? Please send them to <alina AT SPAMFREE ipipan DOT waw DOT pl>.