Locked History Actions

Diff for "PDB/PDBparser"

Differences between revisions 2 and 3
Revision 2 as of 2017-06-12 20:35:18
Size: 94
Comment:
Revision 3 as of 2017-06-27 09:27:28
Size: 6755
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
#format wiki
#language en
Line 4: Line 2:
= PDBparser =
Line 5: Line 4:
= PDBparser = PDBparser is a Polish dependency parser trained on the current version of ([[http://zil.ipipan.waw.pl/PDB|Polish Depedency Bank]]) with the publicly available parsing systems – [[http://maltparser.org|MaltParser]] or [[https://code.google.com/archive/p/mate-tools/|MateParser]]. `MaltParser` is a transition-based dependency parser that uses a deterministic parsing algorithm. The deterministic parsing algorithm builds a dependency structure of an input sentence based on transitions (shift-reduce actions) predicted by a classifier. The classifier learns to predict the next transition given training data and the parse history. `MateParser`, in turn, is a graph-based parser that defines a space of well-formed candidate dependency trees for an input sentence, scores them given an induced parsing model, and selects the highest scoring dependency tree as a correct analysis of the input sentence.

== Dependency parsing models for Polish ==

 * `MateParser` model for Polish: [[attachment:170608_PDBMate.mdl]]
 * `MaltParser` model for Polish: [[attachment:170608_PDBMalt.mco]]

The dependency parsing models for Polish are released under the GNU General Public License v3 (GPL v.3) and by downloading it you accept the conditions of that licence.

== Parsing performance ==
=== 10-fold cross-validation (avg.) ===

|| '''Model''' || '''LAS''' || '''UAS''' ||
|| `Polish MateParser` || 0.85 || 0.91 ||
|| `Polish MaltParser` || 0.84 || 0.89 ||

=== Precision, recall and f-score of individual dependency relations (avg.) ===

The description of Polish dependency relations types is available on [[http://zil.ipipan.waw.pl/PDB/DepRelTypes|Polish dependency relation types]].

||<rowspan=2> '''Dependency relation type''' |||| '''Precision''' |||| '''Recall''' |||| '''F-Measure''' ||
|| Mate || Malt || Mate || Malt || Mate || Malt ||
||<bgcolor="#eef3ff">abbrev_punct ||<bgcolor="#eef3ff">0.97 ||<bgcolor="#eef3ff">0.98 ||<bgcolor="#eef3ff">0.98 ||<bgcolor="#eef3ff">0.96 ||<bgcolor="#eef3ff"> 0.98 ||<bgcolor="#eef3ff">0.97 ||
||adjunct || 0.88 || 0.76 || 0.84 || 0.79 || 0.82 || 0.78 ||
||<bgcolor="#eef3ff">adjunct_qt ||<bgcolor="#eef3ff"> 0.81 ||<bgcolor="#eef3ff"> 0.27 ||<bgcolor="#eef3ff"> 0.25 ||<bgcolor="#eef3ff"> 0.20 ||<bgcolor="#eef3ff"> 0.39 ||<bgcolor="#eef3ff"> 0.23 ||
||aglt || 0.98 || 0.98 || 0.98 || 0.98 || 0.98 || 0.98 ||
||<bgcolor="#eef3ff">app ||<bgcolor="#eef3ff"> 0.70 ||<bgcolor="#eef3ff"> 0.58 ||<bgcolor="#eef3ff"> 0.59 ||<bgcolor="#eef3ff"> 0.47 ||<bgcolor="#eef3ff"> 0.64 ||<bgcolor="#eef3ff"> 0.51 ||
||aux || 0.94 || 0.92 || 0.97 || 0.92 || 0.96 || 0.92 ||
||<bgcolor="#eef3ff">comp ||<bgcolor="#eef3ff"> 0.91 ||<bgcolor="#eef3ff"> 0.88 ||<bgcolor="#eef3ff"> 0.88 ||<bgcolor="#eef3ff"> 0.85 ||<bgcolor="#eef3ff"> 0.89 ||<bgcolor="#eef3ff"> 0.86 ||
||comp_ag || 0.93 || 0.90 || 0.95 || 0.90 || 0.94 || 0.90 ||
||<bgcolor="#eef3ff">comp_fin ||<bgcolor="#eef3ff"> 0.77 ||<bgcolor="#eef3ff"> 0.61 ||<bgcolor="#eef3ff"> 0.80 ||<bgcolor="#eef3ff"> 0.71 ||<bgcolor="#eef3ff"> 0.79 ||<bgcolor="#eef3ff"> 0.66 ||
||comp_inf || 0.93 || 0.90 || 0.93 || 0.90 || 0.93 || 0.90 ||
||<bgcolor="#eef3ff">complm ||<bgcolor="#eef3ff"> 0.89 ||<bgcolor="#eef3ff"> 0.86 ||<bgcolor="#eef3ff"> 0.83 ||<bgcolor="#eef3ff"> 0.83 ||<bgcolor="#eef3ff"> 0.88 ||<bgcolor="#eef3ff"> 0.84 ||
||cond || 0.98 || 0.97 || 0.97 || 0.97 || 0.98 || 0.97 ||
||<bgcolor="#eef3ff">conjunct ||<bgcolor="#eef3ff"> 0.84 ||<bgcolor="#eef3ff"> 0.74 ||<bgcolor="#eef3ff"> 0.81 ||<bgcolor="#eef3ff"> 0.72 ||<bgcolor="#eef3ff"> 0.82 ||<bgcolor="#eef3ff"> 0.73 ||
||coord || 0.65 || 0.51 || 0.66 || 0.52 || 0.65 || 0.52 ||
||<bgcolor="#eef3ff">coord_punct ||<bgcolor="#eef3ff"> 0.58 ||<bgcolor="#eef3ff"> 0.38 ||<bgcolor="#eef3ff"> 0.55 ||<bgcolor="#eef3ff"> 0.48 ||<bgcolor="#eef3ff"> 0.56 ||<bgcolor="#eef3ff"> 0.42 ||
||imp || 1.0 || 1.00 || 0.91 || 0.74 || 0.95 || 0.85 ||
||<bgcolor="#eef3ff">item ||<bgcolor="#eef3ff"> 0.77 ||<bgcolor="#eef3ff"> 0.62 ||<bgcolor="#eef3ff"> 0.51 ||<bgcolor="#eef3ff"> 0.51 ||<bgcolor="#eef3ff"> 0.61 ||<bgcolor="#eef3ff"> 0.56 ||
||mwe || 0.95 || 0.87 || 0.88 || 0.80 || 0.91 || 0.84 ||
||<bgcolor="#eef3ff">ne ||<bgcolor="#eef3ff"> 0.85 ||<bgcolor="#eef3ff"> 0.79 ||<bgcolor="#eef3ff"> 0.69 ||<bgcolor="#eef3ff"> 0.58 ||<bgcolor="#eef3ff"> 0.76 ||<bgcolor="#eef3ff"> 0.67 ||
||neg || 0.98 || 0.98 || 0.98 || 0.99 || 0.93 || 0.98 ||
||<bgcolor="#eef3ff">obj ||<bgcolor="#eef3ff"> 0.88 ||<bgcolor="#eef3ff"> 0.81 ||<bgcolor="#eef3ff"> 0.91 ||<bgcolor="#eef3ff"> 0.88 ||<bgcolor="#eef3ff"> 0.89 ||<bgcolor="#eef3ff"> 0.84 ||
||obj_th || 0.83 || 0.79 || 0.76 || 0.68 || 0.79 || 0.73 ||
||<bgcolor="#eef3ff">pd ||<bgcolor="#eef3ff"> 0.90 ||<bgcolor="#eef3ff"> 0.80 ||<bgcolor="#eef3ff"> 0.84 ||<bgcolor="#eef3ff"> 0.78 ||<bgcolor="#eef3ff"> 0.87 ||<bgcolor="#eef3ff"> 0.79 ||
||pre_coord || 0.88 || 0.85 || 0.90 || 0.57 || 0.89 || 0.68 ||
||<bgcolor="#eef3ff">pred ||<bgcolor="#eef3ff"> 0.94 ||<bgcolor="#eef3ff"> 0.88 ||<bgcolor="#eef3ff"> 0.94 ||<bgcolor="#eef3ff"> 0.88 ||<bgcolor="#eef3ff"> 0.94 ||<bgcolor="#eef3ff"> 0.88 ||
||punct || 0.87 || 0.81 || 0.89 || 0.81 || 0.88 || 0.81 ||
||<bgcolor="#eef3ff">refl ||<bgcolor="#eef3ff"> 0.98 ||<bgcolor="#eef3ff"> 0.97 ||<bgcolor="#eef3ff"> 0.98 ||<bgcolor="#eef3ff"> 0.97 ||<bgcolor="#eef3ff"> 0.98 ||<bgcolor="#eef3ff"> 0.97 ||
||subj || 0.91 || 0.87 || 0.92 || 0.86 || 0.92 || 0.86 ||

== Dependency parser integrated into Multiservice NLP for Polish ==
The performance of !MaltParser model for Polish may be tested in Multiservice NLP – [[http://multiservice.nlp.ipipan.waw.pl]].
To parse a Polish text in Multiservice "select predefined chain of actions": 3: Pantera, `DependencyParser`, input your text, and press the button "Run".

== Publications ==
<<BibMate(key, "wrob:14", omitYears=true)>>
<<BibMate(key, "wro:prz:14", omitYears=true)>>
<<BibMate(key, "wroblewska:12", omitYears=true)>>
<<BibMate(key, "awmw:deparsing", omitYears=true)>>


== Contact ==
Any questions, comments? Please send them to <<MailTo(alina AT SPAMFREE ipipan DOT waw DOT pl)>>.

PDBparser

PDBparser is a Polish dependency parser trained on the current version of (Polish Depedency Bank) with the publicly available parsing systems – MaltParser or MateParser. MaltParser is a transition-based dependency parser that uses a deterministic parsing algorithm. The deterministic parsing algorithm builds a dependency structure of an input sentence based on transitions (shift-reduce actions) predicted by a classifier. The classifier learns to predict the next transition given training data and the parse history. MateParser, in turn, is a graph-based parser that defines a space of well-formed candidate dependency trees for an input sentence, scores them given an induced parsing model, and selects the highest scoring dependency tree as a correct analysis of the input sentence.

Dependency parsing models for Polish

The dependency parsing models for Polish are released under the GNU General Public License v3 (GPL v.3) and by downloading it you accept the conditions of that licence.

Parsing performance

10-fold cross-validation (avg.)

Model

LAS

UAS

Polish MateParser

0.85

0.91

Polish MaltParser

0.84

0.89

Precision, recall and f-score of individual dependency relations (avg.)

The description of Polish dependency relations types is available on Polish dependency relation types.

Dependency relation type

Precision

Recall

F-Measure

Mate

Malt

Mate

Malt

Mate

Malt

abbrev_punct

0.97

0.98

0.98

0.96

0.98

0.97

adjunct

0.88

0.76

0.84

0.79

0.82

0.78

adjunct_qt

0.81

0.27

0.25

0.20

0.39

0.23

aglt

0.98

0.98

0.98

0.98

0.98

0.98

app

0.70

0.58

0.59

0.47

0.64

0.51

aux

0.94

0.92

0.97

0.92

0.96

0.92

comp

0.91

0.88

0.88

0.85

0.89

0.86

comp_ag

0.93

0.90

0.95

0.90

0.94

0.90

comp_fin

0.77

0.61

0.80

0.71

0.79

0.66

comp_inf

0.93

0.90

0.93

0.90

0.93

0.90

complm

0.89

0.86

0.83

0.83

0.88

0.84

cond

0.98

0.97

0.97

0.97

0.98

0.97

conjunct

0.84

0.74

0.81

0.72

0.82

0.73

coord

0.65

0.51

0.66

0.52

0.65

0.52

coord_punct

0.58

0.38

0.55

0.48

0.56

0.42

imp

1.0

1.00

0.91

0.74

0.95

0.85

item

0.77

0.62

0.51

0.51

0.61

0.56

mwe

0.95

0.87

0.88

0.80

0.91

0.84

ne

0.85

0.79

0.69

0.58

0.76

0.67

neg

0.98

0.98

0.98

0.99

0.93

0.98

obj

0.88

0.81

0.91

0.88

0.89

0.84

obj_th

0.83

0.79

0.76

0.68

0.79

0.73

pd

0.90

0.80

0.84

0.78

0.87

0.79

pre_coord

0.88

0.85

0.90

0.57

0.89

0.68

pred

0.94

0.88

0.94

0.88

0.94

0.88

punct

0.87

0.81

0.89

0.81

0.88

0.81

refl

0.98

0.97

0.98

0.97

0.98

0.97

subj

0.91

0.87

0.92

0.86

0.92

0.86

Dependency parser integrated into Multiservice NLP for Polish

The performance of MaltParser model for Polish may be tested in Multiservice NLP – http://multiservice.nlp.ipipan.waw.pl. To parse a Polish text in Multiservice "select predefined chain of actions": 3: Pantera, DependencyParser, input your text, and press the button "Run".

Publications

List of publications

Alina Wróblewska. Polish Dependency Parser Trained on an Automatically Induced Dependency Bank. Ph.D. dissertation, Institute of Computer Science, Polish Academy of Sciences, Warsaw, 2014.

List of publications

Alina Wróblewska and Adam Przepiórkowski. Projection-based annotation of a Polish dependency treebank. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC 2014, pages 2306–2312, Reykjavík, Iceland, 2014. European Language Resources Association (ELRA).

List of publications

Alina Wróblewska. Polish dependency bank. Linguistic Issues in Language Technology, 7(1), 2012.

List of publications

Alina Wróblewska and Marcin Woliński. Preliminary experiments in Polish dependency parsing. In Pascal Bouvry, Mieczysław A. Kłopotek, Franck Leprevost, Małgorzata Marciniak, Agnieszka Mykowiecka, and Henryk Rybiński, editors, Security and Intelligent Information Systems: International Joint Conference, SIIS 2011, Warsaw, Poland, June 13-14, 2011, Revised Selected Papers, number 7053 in Lecture Notes in Computer Science, pages 279–292. Springer-Verlag, 2012.

Contact

Any questions, comments? Please send them to <alina AT SPAMFREE ipipan DOT waw DOT pl>.