Locked History Actions

Diff for "PDB/PDBparser"

Differences between revisions 1 and 33 (spanning 32 versions)
Revision 1 as of 2017-06-12 20:33:30
Size: 14
Comment:
Revision 33 as of 2019-01-18 11:28:28
Size: 8288
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= PDBparser= #acl AlinaWroblewska:read,write,revert All:read
== PDB-based dependency parsing models for Polish ==

The PDB-based models are trained on the current version of [[http://zil.ipipan.waw.pl/PDB|Polish Depedency Bank]] with the publicly available parsing systems – [[https://github.com/360er0/COMBO|COMBO]], [[https://code.google.com/archive/p/mate-tools/|MateParser]] and [[http://maltparser.org|MaltParser]]. /* ''MaltParser'' is a transition-based dependency parser that uses a deterministic parsing algorithm. The deterministic parsing algorithm builds a dependency structure of an input sentence based on transitions (shift-reduce actions) predicted by a classifier. The classifier learns to predict the next transition given training data and the parse history. `MateParser`, in turn, is a graph-based parser that defines a space of well-formed candidate dependency trees for an input sentence, scores them given an induced parsing model, and selects the highest scoring dependency tree as a correct analysis of the input sentence. */

 * COMBO
  * PDB-based COMBO model for part-of-speech tagging, lemmatisation, and dependency parsing: [[attachment:190115_COMBO_PDB_nosem.pkl]]
  * PDB-based COMBO model for part-of-speech tagging, lemmatisation, dependency parsing and semantic role labelling: [[attachment: 190115_COMBO_PDB_sem.pkl]]

 {{{#!wiki comment
 * '''NEW!''' PDB-based COMBO model compatible with the tagset of Morfeusz 2: [[attachment:180912_PDBCOMBO.pkl]]

 * MateParser

  * '''NEW!''' PDB-based Mate model compatible with the tagset of Morfeusz 2: [[attachment:180322_PDBMate.mdl]]
  * PDB-based Mate model compatible with the tagset of Morfeusz: [[attachment:170608_PDBMate.mdl]]}}}

 * MaltParser
  * PDB-based MaltParser model: [[attachment:]]

{{{#!wiki comment
  * '''NEW!''' PDB-based MaltParser model compatible with the tagset of Morfeusz 2: [[attachment:180322_PDBMalt.mco]]
  * PDB-basd MaltParser model compatible with the tagset of Morfeusz: [[attachment:170608_PDBMalt.mco]]}}}


== PDBUD-based dependency parsing models for Polish ==
The PDBUD-based models are trained on the current version of [[http://git.nlp.ipipan.waw.pl/alina/PDBUD|Polish Depedency Bank in Universal Dependencies format]] with the publicly available parsing systems – [[http://ufal.mff.cuni.cz/udpipe|UDPipe]] and [[https://github.com/360er0/COMBO|COMBO]].

 * [[http://mozart.ipipan.waw.pl/~prybak/model_poleval2018/model_A_semi.pkl|COMBO]] model for Polish (the model estimated for the [[http://poleval.pl/tasks#task1|PolEval 2018]] competition)
 * [[attachment:180606_PDBUDPipe.udpipe|UDPipe]] model for Polish

== Parsing performance ==

See [[http://clip.ipipan.waw.pl/benchmarks|Dependency parsing]] section.

{{{#!wiki comment
=== 10-fold cross-validation (avg.) ===

|| '''Model''' || '''LAS''' || '''UAS''' ||
|| `PDBMate` || 0.85 || 0.89 ||
|| `PDBMalt` || 0.82 || 0.86 ||

=== Precision, recall and f-score of individual dependency relations (avg.) ===

The description of Polish dependency relations types is available on [[http://zil.ipipan.waw.pl/PDB/DepRelTypes|Polish dependency relation types]].

||<rowspan=2> '''Dependency relation type''' |||| '''Precision''' |||| '''Recall''' |||| '''F-Measure''' ||
|| Mate || Malt || Mate || Malt || Mate || Malt ||
||<bgcolor="#eef3ff">abbrev_punct ||<bgcolor="#eef3ff">0.99 ||<bgcolor="#eef3ff">0.99 ||<bgcolor="#eef3ff">0.98 ||<bgcolor="#eef3ff">0.97 ||<bgcolor="#eef3ff"> 0.98 ||<bgcolor="#eef3ff">0.98 ||
||adjunct || 0.89 || 0.73 || 0.92 || 0.77 || 0.82 || 0.75 ||
||<bgcolor="#eef3ff">adjunct_qt ||<bgcolor="#eef3ff"> 0.74 ||<bgcolor="#eef3ff"> 0.51 ||<bgcolor="#eef3ff"> 0.76 ||<bgcolor="#eef3ff"> 0.58 ||<bgcolor="#eef3ff"> 0.75 ||<bgcolor="#eef3ff"> 0.55 ||
||aglt || 1.00 || 0.98 || 1.00 || 0.98 || 0.98 || 0.98 ||
||<bgcolor="#eef3ff">app ||<bgcolor="#eef3ff"> 0.75 ||<bgcolor="#eef3ff"> 0.58 ||<bgcolor="#eef3ff"> 0.69 ||<bgcolor="#eef3ff"> 0.52 ||<bgcolor="#eef3ff"> 0.72 ||<bgcolor="#eef3ff"> 0.55 ||
||aux || 0.95 || 0.90 || 0.97 || 0.92 || 0.96 || 0.91 ||
||<bgcolor="#eef3ff">comp ||<bgcolor="#eef3ff"> 0.90 ||<bgcolor="#eef3ff"> 0.85 ||<bgcolor="#eef3ff"> 0.87 ||<bgcolor="#eef3ff"> 0.82 ||<bgcolor="#eef3ff"> 0.88 ||<bgcolor="#eef3ff"> 0.84 ||
||comp_ag || 0.95 || 0.90 || 0.96 || 0.91 || 0.94 || 0.90 ||
||<bgcolor="#eef3ff">comp_fin ||<bgcolor="#eef3ff"> 0.87 ||<bgcolor="#eef3ff"> 0.75 ||<bgcolor="#eef3ff"> 0.86 ||<bgcolor="#eef3ff"> 0.79 ||<bgcolor="#eef3ff"> 0.87 ||<bgcolor="#eef3ff"> 0.77 ||
||comp_inf || 0.95 || 0.91 || 0.96 || 0.90 || 0.93 || 0.90 ||
||<bgcolor="#eef3ff"> cond ||<bgcolor="#eef3ff"> 1.00 ||<bgcolor="#eef3ff"> 0.97 ||<bgcolor="#eef3ff"> 1.00 ||<bgcolor="#eef3ff"> 0.96 ||<bgcolor="#eef3ff"> 1.00 ||<bgcolor="#eef3ff"> 0.96 ||
||conjunct || 0.85 || 0.71 || 0.82 || 0.65 || 0.82 || 0.68 ||
||<bgcolor="#eef3ff"> imp ||<bgcolor="#eef3ff">0.98 ||<bgcolor="#eef3ff"> 0.97 ||<bgcolor="#eef3ff"> 0.91 ||<bgcolor="#eef3ff"> 0.87 ||<bgcolor="#eef3ff">0.94 ||<bgcolor="#eef3ff"> 0.92 ||
|| item || 0.87 || 0.4 || 0.73 || 0.37 || 0.61 || 0.39 ||
||<bgcolor="#eef3ff"> mwe ||<bgcolor="#eef3ff"> 0.90 ||<bgcolor="#eef3ff"> 0.83 ||<bgcolor="#eef3ff"> 0.83 ||<bgcolor="#eef3ff"> 0.75 ||<bgcolor="#eef3ff">0.87 ||<bgcolor="#eef3ff"> 0.79 ||
||ne || 0.87 || 0.78 || 0.73 || 0.64 || 0.76 || 0.70 ||
||<bgcolor="#eef3ff"> neg ||<bgcolor="#eef3ff">0.99 ||<bgcolor="#eef3ff"> 0.97 ||<bgcolor="#eef3ff"> 1.00 ||<bgcolor="#eef3ff"> 0.98 ||<bgcolor="#eef3ff">0.99 ||<bgcolor="#eef3ff"> 0.98 ||
||obj || 0.89 || 0.81 || 0.91 || 0.86 || 0.89 || 0.83 ||
||<bgcolor="#eef3ff"> obj_th ||<bgcolor="#eef3ff">0.83 ||<bgcolor="#eef3ff"> 0.76 ||<bgcolor="#eef3ff"> 0.76 ||<bgcolor="#eef3ff"> 0.65 ||<bgcolor="#eef3ff">0.80 ||<bgcolor="#eef3ff"> 0.70 ||
||pd || 0.86 || 0.77 || 0.80 || 0.72 || 0.87 || 0.74 ||
||<bgcolor="#eef3ff"> pre_coord ||<bgcolor="#eef3ff">0.86 ||<bgcolor="#eef3ff"> 0.76 ||<bgcolor="#eef3ff">0.78 ||<bgcolor="#eef3ff"> 0.55 ||<bgcolor="#eef3ff">0.82 ||<bgcolor="#eef3ff"> 0.64 ||
||punct || 0.97 || 0.75 || 0.98 || 0.76 || 0.88 || 0.76 ||
||<bgcolor="#eef3ff">refl ||<bgcolor="#eef3ff"> 0.99 ||<bgcolor="#eef3ff"> 0.96 ||<bgcolor="#eef3ff"> 0.99 ||<bgcolor="#eef3ff"> 0.96 ||<bgcolor="#eef3ff"> 0.99 ||<bgcolor="#eef3ff"> 0.96 ||
||root || 0.91 || 0.80 || 0.91 || 0.81 || 0.94 ||0.80 ||
||<bgcolor="#eef3ff"> subj ||<bgcolor="#eef3ff"> 0.94||<bgcolor="#eef3ff"> 0.84 ||<bgcolor="#eef3ff">0.94 ||<bgcolor="#eef3ff"> 0.83 ||<bgcolor="#eef3ff">0.94 ||<bgcolor="#eef3ff"> 0.84 ||
}}}

== PDB-based MaltParser in Multiservice ==
 * The performance of !MaltParser model for Polish may be tested in Multiservice NLP – [[http://multiservice.nlp.ipipan.waw.pl]].
 * To parse a Polish text in Multiservice "Select predefined chain of actions": 5: Concraft, DependencyParser, input your text, and press the button "Run".
 * To download the parser's output in CoNLL format, "Select output format:":

== Publications ==
<<BibMate(key, "wro:14", omitYears=true)>>
<<BibMate(key, "wro:prz:14", omitYears=true)>>
<<BibMate(key, "wroblewska:12", omitYears=true)>>
<<BibMate(key, "awmw:departing", omitYears=true)>>


== Licensing ==

The dependency parsing models for Polish are released under the [[https://creativecommons.org/licenses/by-nc-sa/4.0/|CC BY-NC-SA 4.0]] licence and by downloading it you accept the conditions of that licence.

== Founding ==
The research was founded by SONATA 8 grant no 2014/15/D/HS2/03486 from the National Science Centre Poland and by the Polish Ministry of Science and Higher Education as part of the investment in the CLARIN-PL research infrastructure.


== Contact ==
Any questions, comments? Please send them to <<MailTo(alina AT SPAMFREE ipipan DOT waw DOT pl)>>.

PDB-based dependency parsing models for Polish

The PDB-based models are trained on the current version of Polish Depedency Bank with the publicly available parsing systems – COMBO, MateParser and MaltParser.

PDBUD-based dependency parsing models for Polish

The PDBUD-based models are trained on the current version of Polish Depedency Bank in Universal Dependencies format with the publicly available parsing systems – UDPipe and COMBO.

Parsing performance

See Dependency parsing section.

PDB-based MaltParser in Multiservice

  • The performance of MaltParser model for Polish may be tested in Multiservice NLP – http://multiservice.nlp.ipipan.waw.pl.

  • To parse a Polish text in Multiservice "Select predefined chain of actions": 5: Concraft, DependencyParser, input your text, and press the button "Run".

  • To download the parser's output in CoNLL format, "Select output format:":

Publications

List of publications

List of publications
List of publications
List of publications

Licensing

The dependency parsing models for Polish are released under the CC BY-NC-SA 4.0 licence and by downloading it you accept the conditions of that licence.

Founding

The research was founded by SONATA 8 grant no 2014/15/D/HS2/03486 from the National Science Centre Poland and by the Polish Ministry of Science and Higher Education as part of the investment in the CLARIN-PL research infrastructure.

Contact

Any questions, comments? Please send them to <alina AT SPAMFREE ipipan DOT waw DOT pl>.