Diff for "PDB/PDBparser"

Differences between revisions 1 and 33 (spanning 32 versions)

PDB-based dependency parsing models for Polish

The PDB-based models are trained on the current version of Polish Depedency Bank with the publicly available parsing systems – COMBO, MateParser and MaltParser.

COMBO
- PDB-based COMBO model for part-of-speech tagging, lemmatisation, and dependency parsing: 190115_COMBO_PDB_nosem.pkl
- PDB-based COMBO model for part-of-speech tagging, lemmatisation, dependency parsing and semantic role labelling: 190115_COMBO_PDB_sem.pkl
- NEW! PDB-based COMBO model compatible with the tagset of Morfeusz 2: 180912_PDBCOMBO.pkl
- MateParser
  - NEW! PDB-based Mate model compatible with the tagset of Morfeusz 2: 180322_PDBMate.mdl
  - PDB-based Mate model compatible with the tagset of Morfeusz: 170608_PDBMate.mdl
MaltParser
- PDB-based MaltParser model:

PDBUD-based dependency parsing models for Polish

The PDBUD-based models are trained on the current version of Polish Depedency Bank in Universal Dependencies format with the publicly available parsing systems – UDPipe and COMBO.

COMBO model for Polish (the model estimated for the PolEval 2018 competition)
UDPipe model for Polish

Parsing performance

See Dependency parsing section.

10-fold cross-validation (avg.)

Model	LAS	UAS
`PDBMate`	0.85	0.89
`PDBMalt`	0.82	0.86

Precision, recall and f-score of individual dependency relations (avg.)

The description of Polish dependency relations types is available on Polish dependency relation types.

Dependency relation type	Precision		Recall		F-Measure
Dependency relation type	Mate	Malt	Mate	Malt	Mate	Malt
abbrev_punct	0.99	0.99	0.98	0.97	0.98	0.98
adjunct	0.89	0.73	0.92	0.77	0.82	0.75
adjunct_qt	0.74	0.51	0.76	0.58	0.75	0.55
aglt	1.00	0.98	1.00	0.98	0.98	0.98
app	0.75	0.58	0.69	0.52	0.72	0.55
aux	0.95	0.90	0.97	0.92	0.96	0.91
comp	0.90	0.85	0.87	0.82	0.88	0.84
comp_ag	0.95	0.90	0.96	0.91	0.94	0.90
comp_fin	0.87	0.75	0.86	0.79	0.87	0.77
comp_inf	0.95	0.91	0.96	0.90	0.93	0.90
cond	1.00	0.97	1.00	0.96	1.00	0.96
conjunct	0.85	0.71	0.82	0.65	0.82	0.68
imp	0.98	0.97	0.91	0.87	0.94	0.92
item	0.87	0.4	0.73	0.37	0.61	0.39
mwe	0.90	0.83	0.83	0.75	0.87	0.79
ne	0.87	0.78	0.73	0.64	0.76	0.70
neg	0.99	0.97	1.00	0.98	0.99	0.98
obj	0.89	0.81	0.91	0.86	0.89	0.83
obj_th	0.83	0.76	0.76	0.65	0.80	0.70
pd	0.86	0.77	0.80	0.72	0.87	0.74
pre_coord	0.86	0.76	0.78	0.55	0.82	0.64
punct	0.97	0.75	0.98	0.76	0.88	0.76
refl	0.99	0.96	0.99	0.96	0.99	0.96
root	0.91	0.80	0.91	0.81	0.94	0.80
subj	0.94	0.84	0.94	0.83	0.94	0.84

PDB-based MaltParser in Multiservice

The performance of MaltParser model for Polish may be tested in Multiservice NLP – http://multiservice.nlp.ipipan.waw.pl.
To parse a Polish text in Multiservice "Select predefined chain of actions": 5: Concraft, DependencyParser, input your text, and press the button "Run".
To download the parser's output in CoNLL format, "Select output format:":

Publications

List of publications

Licensing

The dependency parsing models for Polish are released under the CC BY-NC-SA 4.0 licence and by downloading it you accept the conditions of that licence.

Founding

The research was founded by SONATA 8 grant no 2014/15/D/HS2/03486 from the National Science Centre Poland and by the Polish Ministry of Science and Higher Education as part of the investment in the CLARIN-PL research infrastructure.

Contact

Any questions, comments? Please send them to <alina AT SPAMFREE ipipan DOT waw DOT pl>.

-  ⇤ ← Revision 1 as of 2017-06-12 20:33:30 → 
  Size: 14
  Editor: AlinaWroblewska
  Comment:
+   ← Revision 33 as of 2019-01-18 11:28:28 → ⇥
  Size: 8288
  Editor: AlinaWroblewska
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-= PDBparser=
+#acl AlinaWroblewska:read,write,revert All:read
== PDB-based dependency parsing models for Polish ==

The PDB-based models are trained on the current version of [[http://zil.ipipan.waw.pl/PDB|Polish Depedency Bank]] with the publicly available parsing systems – [[https://github.com/360er0/COMBO|COMBO]], [[https://code.google.com/archive/p/mate-tools/|MateParser]] and [[http://maltparser.org|MaltParser]]. /* ''MaltParser'' is a transition-based dependency parser that uses a deterministic parsing algorithm. The deterministic parsing algorithm builds a dependency structure of an input sentence based on transitions (shift-reduce actions) predicted by a classifier. The classifier learns to predict the next transition given training data and the parse history. `MateParser`, in turn, is a graph-based parser that defines a space of well-formed candidate dependency trees for an input sentence, scores them given an induced parsing model, and selects the highest scoring dependency tree as a correct analysis of the input sentence. */

 * COMBO
  * PDB-based COMBO model for part-of-speech tagging, lemmatisation, and dependency parsing: [[attachment:190115_COMBO_PDB_nosem.pkl]]
  * PDB-based COMBO model for part-of-speech tagging, lemmatisation, dependency parsing and semantic role labelling: [[attachment: 190115_COMBO_PDB_sem.pkl]]

 {{{#!wiki comment 
 * '''NEW!''' PDB-based COMBO model compatible with the tagset of Morfeusz 2: [[attachment:180912_PDBCOMBO.pkl]]

 * MateParser

  * '''NEW!''' PDB-based Mate model compatible with the tagset of Morfeusz 2: [[attachment:180322_PDBMate.mdl]]
  * PDB-based Mate model compatible with the tagset of Morfeusz: [[attachment:170608_PDBMate.mdl]]}}}

 * MaltParser
  * PDB-based MaltParser model: [[attachment:]]

{{{#!wiki comment 
  * '''NEW!''' PDB-based MaltParser model compatible with the tagset of Morfeusz 2: [[attachment:180322_PDBMalt.mco]]
  * PDB-basd MaltParser model compatible with the tagset of Morfeusz: [[attachment:170608_PDBMalt.mco]]}}}


== PDBUD-based dependency parsing models for Polish ==
The PDBUD-based models are trained on the current version of [[http://git.nlp.ipipan.waw.pl/alina/PDBUD|Polish Depedency Bank in Universal Dependencies format]] with the publicly available parsing systems – [[http://ufal.mff.cuni.cz/udpipe|UDPipe]] and [[https://github.com/360er0/COMBO|COMBO]].

 * [[http://mozart.ipipan.waw.pl/~prybak/model_poleval2018/model_A_semi.pkl|COMBO]] model for Polish (the model estimated for the [[http://poleval.pl/tasks#task1|PolEval 2018]] competition)
 * [[attachment:180606_PDBUDPipe.udpipe|UDPipe]] model for Polish

== Parsing performance ==

See [[http://clip.ipipan.waw.pl/benchmarks|Dependency parsing]] section.

{{{#!wiki comment
=== 10-fold cross-validation (avg.) ===

|| '''Model''' || '''LAS''' || '''UAS''' ||
|| `PDBMate` || 0.85 || 0.89 ||
|| `PDBMalt` || 0.82 || 0.86 ||

=== Precision, recall and f-score of individual dependency relations (avg.) ===

The description of Polish dependency relations types is available on [[http://zil.ipipan.waw.pl/PDB/DepRelTypes|Polish dependency relation types]].

||<rowspan=2> '''Dependency relation type'''      |||| '''Precision''' |||| '''Recall'''  |||| '''F-Measure'''  ||
|| Mate || Malt   || Mate || Malt    || Mate || Malt ||
||<bgcolor="#eef3ff">abbrev_punct    ||<bgcolor="#eef3ff">0.99 ||<bgcolor="#eef3ff">0.99   ||<bgcolor="#eef3ff">0.98 ||<bgcolor="#eef3ff">0.97    ||<bgcolor="#eef3ff"> 0.98 ||<bgcolor="#eef3ff">0.98 ||
||adjunct         || 0.89 || 0.73   || 0.92 || 0.77    || 0.82 || 0.75 ||
||<bgcolor="#eef3ff">adjunct_qt      ||<bgcolor="#eef3ff"> 0.74 ||<bgcolor="#eef3ff"> 0.51   ||<bgcolor="#eef3ff"> 0.76 ||<bgcolor="#eef3ff"> 0.58    ||<bgcolor="#eef3ff"> 0.75 ||<bgcolor="#eef3ff"> 0.55 ||
||aglt            || 1.00 || 0.98   || 1.00 || 0.98    || 0.98 || 0.98 ||
||<bgcolor="#eef3ff">app             ||<bgcolor="#eef3ff"> 0.75 ||<bgcolor="#eef3ff"> 0.58   ||<bgcolor="#eef3ff"> 0.69 ||<bgcolor="#eef3ff"> 0.52    ||<bgcolor="#eef3ff"> 0.72 ||<bgcolor="#eef3ff"> 0.55 ||
||aux             || 0.95 || 0.90   || 0.97 || 0.92    || 0.96 || 0.91 ||
||<bgcolor="#eef3ff">comp            ||<bgcolor="#eef3ff"> 0.90 ||<bgcolor="#eef3ff"> 0.85   ||<bgcolor="#eef3ff"> 0.87 ||<bgcolor="#eef3ff"> 0.82    ||<bgcolor="#eef3ff"> 0.88 ||<bgcolor="#eef3ff"> 0.84 ||
||comp_ag         || 0.95 || 0.90   || 0.96 || 0.91    || 0.94 || 0.90 ||
||<bgcolor="#eef3ff">comp_fin        ||<bgcolor="#eef3ff"> 0.87 ||<bgcolor="#eef3ff"> 0.75   ||<bgcolor="#eef3ff"> 0.86 ||<bgcolor="#eef3ff"> 0.79    ||<bgcolor="#eef3ff"> 0.87 ||<bgcolor="#eef3ff"> 0.77 ||
||comp_inf        || 0.95 || 0.91   || 0.96 || 0.90    || 0.93 || 0.90 ||
||<bgcolor="#eef3ff">   cond     ||<bgcolor="#eef3ff"> 1.00 ||<bgcolor="#eef3ff"> 0.97	    ||<bgcolor="#eef3ff"> 1.00 ||<bgcolor="#eef3ff"> 	0.96       ||<bgcolor="#eef3ff"> 1.00 ||<bgcolor="#eef3ff"> 0.96    ||
||conjunct        || 0.85 || 0.71	    || 0.82 || 0.65	       || 0.82 || 0.68     ||
||<bgcolor="#eef3ff"> imp ||<bgcolor="#eef3ff">0.98 ||<bgcolor="#eef3ff">	0.97    ||<bgcolor="#eef3ff"> 0.91 ||<bgcolor="#eef3ff"> 	0.87       ||<bgcolor="#eef3ff">0.94 ||<bgcolor="#eef3ff">  0.92   ||
|| item            || 0.87 ||  0.4	    ||  0.73 ||  0.37	       || 0.61 || 0.39     ||
||<bgcolor="#eef3ff">    mwe    ||<bgcolor="#eef3ff"> 0.90 ||<bgcolor="#eef3ff">	0.83    ||<bgcolor="#eef3ff"> 0.83 ||<bgcolor="#eef3ff"> 0.75	       ||<bgcolor="#eef3ff">0.87 ||<bgcolor="#eef3ff">  0.79   ||
||ne              ||  0.87 || 0.78	    || 0.73 || 0.64	       ||  0.76 ||  0.70     ||
||<bgcolor="#eef3ff">   neg     ||<bgcolor="#eef3ff">0.99 ||<bgcolor="#eef3ff">	 0.97   ||<bgcolor="#eef3ff"> 1.00 ||<bgcolor="#eef3ff"> 0.98	       ||<bgcolor="#eef3ff">0.99  ||<bgcolor="#eef3ff">  0.98   ||
||obj             || 0.89 || 0.81	    || 0.91 || 0.86	       ||  0.89 || 0.83     ||
||<bgcolor="#eef3ff">     obj_th   ||<bgcolor="#eef3ff">0.83 ||<bgcolor="#eef3ff">	0.76    ||<bgcolor="#eef3ff"> 0.76  ||<bgcolor="#eef3ff"> 	0.65       ||<bgcolor="#eef3ff">0.80 ||<bgcolor="#eef3ff">  0.70   ||
||pd              || 0.86 || 0.77	    || 0.80 || 0.72	       || 0.87 || 0.74      ||
||<bgcolor="#eef3ff">  pre_coord      ||<bgcolor="#eef3ff">0.86 ||<bgcolor="#eef3ff">	0.76    ||<bgcolor="#eef3ff">0.78 ||<bgcolor="#eef3ff"> 0.55	       ||<bgcolor="#eef3ff">0.82 ||<bgcolor="#eef3ff">   0.64  ||
||punct           || 0.97 || 0.75	    || 0.98 || 0.76	       || 0.88 || 0.76     ||
||<bgcolor="#eef3ff">refl            ||<bgcolor="#eef3ff"> 0.99 ||<bgcolor="#eef3ff"> 0.96	    ||<bgcolor="#eef3ff"> 0.99 ||<bgcolor="#eef3ff"> 0.96	       ||<bgcolor="#eef3ff"> 0.99 ||<bgcolor="#eef3ff"> 0.96     ||
||root            || 0.91 || 0.80	    || 0.91 || 0.81	       || 0.94 ||0.80     ||
||<bgcolor="#eef3ff">    subj    ||<bgcolor="#eef3ff"> 0.94||<bgcolor="#eef3ff">	   0.84 ||<bgcolor="#eef3ff">0.94 ||<bgcolor="#eef3ff"> 	0.83       ||<bgcolor="#eef3ff">0.94 ||<bgcolor="#eef3ff"> 0.84    ||
}}}

== PDB-based MaltParser in Multiservice ==
 * The performance of !MaltParser model for Polish may be tested in Multiservice NLP – [[http://multiservice.nlp.ipipan.waw.pl]].
 * To parse a Polish text in Multiservice "Select predefined chain of actions": 5: Concraft, DependencyParser, input your text, and press the button "Run".
 * To download the parser's output in CoNLL format, "Select output format:":  

== Publications ==
<<BibMate(key, "wro:14", omitYears=true)>>
<<BibMate(key, "wro:prz:14", omitYears=true)>>
<<BibMate(key, "wroblewska:12", omitYears=true)>>
<<BibMate(key, "awmw:departing", omitYears=true)>>


== Licensing ==

The dependency parsing models for Polish are released under the [[https://creativecommons.org/licenses/by-nc-sa/4.0/|CC BY-NC-SA 4.0]] licence and by downloading it you accept the conditions of that licence.

== Founding ==
The research was founded by SONATA 8 grant no 2014/15/D/HS2/03486 from the National Science Centre Poland and by the Polish Ministry of Science and Higher Education as part of the investment in the CLARIN-PL research infrastructure.


== Contact ==
Any questions, comments? Please send them to <<MailTo(alina AT SPAMFREE ipipan DOT waw DOT pl)>>.