Diff for "PDB/PDBparser"

COMBO-pytorch model for dependency parsing only (with HerBERT-base embeddings),
COMBO model for dependency parsing only
COMBO model for part-of-speech tagging, lemmatisation, and dependency parsing
COMBO model for part-of-speech tagging, lemmatisation, dependency parsing, and semantic role labelling
COMBO model for (semantic) dependency parsing only
MATE model for dependency parsing
MaltParser model for dependency parsing

PDB-UD-trained dependency parsing models for Polish

The PDB-UD-based models are trained on the current version of Polish Dependency Bank in Universal Dependencies format with the publicly available parsing systems – COMBO-pytorch, COMBO, UDPipe.

COMBO-pytorch model for for part-of-speech tagging, lemmatisation, and dependency parsing (with HerBERT-base embeddings),
COMBO-pytorch model for for part-of-speech tagging, lemmatisation, and dependency parsing (with HerBERT-large embeddings),
COMBO-pytorch model for for part-of-speech tagging, lemmatisation, and dependency parsing (with fastText embeddings),
COMBO model for part-of-speech tagging, lemmatisation, and dependency parsing
COMBO model for part-of-speech tagging, lemmatisation, dependency parsing, and semantic role labelling
UDPipe model for tokenisation, part-of-speech tagging, lemmatisation, and dependency parsing
UDPipe model for tokenisation

COMBO model for Polish (the model estimated for the PolEval 2018 competition)
UDPipe model for Polish

190115_COMBO_PDB_nosem.pkl – PDB-based COMBO model for part-of-speech tagging, lemmatisation, and dependency parsing
190115_COMBO_PDB_sem.pkl – PDB-based COMBO model for part-of-speech tagging, lemmatisation, dependency parsing and semantic role labelling

NEW! PDB-based COMBO model compatible with the tagset of Morfeusz 2: 180912_PDBCOMBO.pkl
MateParser
- NEW! PDB-based Mate model compatible with the tagset of Morfeusz 2: 180322_PDBMate.mdl
- PDB-based Mate model compatible with the tagset of Morfeusz: 170608_PDBMate.mdl
MateParser
- 190125_MATE_PDB.model – PDB-based MateParser model for dependency parsing
MaltParser
- 190125_MALT_PDB.mco – PDB-based MaltParser model for dependency parsing
- NEW! PDB-based MaltParser model compatible with the tagset of Morfeusz 2: 180322_PDBMalt.mco
- PDB-basd MaltParser model compatible with the tagset of Morfeusz: 170608_PDBMalt.mco

10-fold cross-validation (avg.)

Model	LAS	UAS
`PDBMate`	0.85	0.89
`PDBMalt`	0.82	0.86

Precision, recall and f-score of individual dependency relations (avg.)

The description of Polish dependency relations types is available on Polish dependency relation types.

Dependency relation type	Precision		Recall		F-Measure
Dependency relation type	Mate	Malt	Mate	Malt	Mate	Malt
abbrev_punct	0.99	0.99	0.98	0.97	0.98	0.98
adjunct	0.89	0.73	0.92	0.77	0.82	0.75
adjunct_qt	0.74	0.51	0.76	0.58	0.75	0.55
aglt	1.00	0.98	1.00	0.98	0.98	0.98
app	0.75	0.58	0.69	0.52	0.72	0.55
aux	0.95	0.90	0.97	0.92	0.96	0.91
comp	0.90	0.85	0.87	0.82	0.88	0.84
comp_ag	0.95	0.90	0.96	0.91	0.94	0.90
comp_fin	0.87	0.75	0.86	0.79	0.87	0.77
comp_inf	0.95	0.91	0.96	0.90	0.93	0.90
cond	1.00	0.97	1.00	0.96	1.00	0.96
conjunct	0.85	0.71	0.82	0.65	0.82	0.68
imp	0.98	0.97	0.91	0.87	0.94	0.92
item	0.87	0.4	0.73	0.37	0.61	0.39
mwe	0.90	0.83	0.83	0.75	0.87	0.79
ne	0.87	0.78	0.73	0.64	0.76	0.70
neg	0.99	0.97	1.00	0.98	0.99	0.98
obj	0.89	0.81	0.91	0.86	0.89	0.83
obj_th	0.83	0.76	0.76	0.65	0.80	0.70
pd	0.86	0.77	0.80	0.72	0.87	0.74
pre_coord	0.86	0.76	0.78	0.55	0.82	0.64
punct	0.97	0.75	0.98	0.76	0.88	0.76
refl	0.99	0.96	0.99	0.96	0.99	0.96
root	0.91	0.80	0.91	0.81	0.94	0.80
subj	0.94	0.84	0.94	0.83	0.94	0.84

COMBO demo
MaltParser demo in Multiservice NLP
- To parse a Polish text in Multiservice "Select predefined chain of actions": 5: Concraft, DependencyParser, input your text, and press the button "Run".
- To download the parser's output in CoNLL format, "Select output format:".

List of publications

Alina Wróblewska. Polish Dependency Parser Trained on an Automatically Induced Dependency Bank. Ph.D. dissertation, Institute of Computer Science, Polish Academy of Sciences, Warsaw, 2014.

List of publications

Alina Wróblewska. Polish dependency bank. Linguistic Issues in Language Technology, 7(1), 2012.

-  ⇤ ← Revision 1 as of 2017-06-12 20:33:30 → 
  Size: 14
  Editor: AlinaWroblewska
  Comment:
+   ← Revision 95 as of 2022-09-08 16:25:36 → ⇥
  Size: 12819
  Editor: AlinaWroblewska
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-= PDBparser=
+#acl AlinaWroblewska:read,write,revert All:read
== Polish COMBO models ==

The [[https://gitlab.clarin-pl.eu/syntactic-tools/combo/-/tree/master|COMBO]] models for Polish are trained on the current version of [[http://zil.ipipan.waw.pl/PDB|Polish Dependency Bank]]. The models use the [[https://huggingface.co/allegro/herbert-base-cased|HerBERT]] language model.

== PDB-trained models ==
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO_pytorch/combo_PDB_parseonly_220906.tar.gz|model]] for dependency parsing only
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO_pytorch/combo_PDB_full_220906.tar.gz|model]] for part-of-speech tagging, morphological analysis, lemmatisation, and dependency parsing (dependency relation types '''without''' semantic extensions, e.g. adjunct instead of adjunct_temp)
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO_pytorch/combo_PDB_full_SEMLAB_220906.tar.gz|model]] for part-of-speech tagging, morphological analysis, lemmatisation, and dependency parsing (dependency relation types '''with''' semantic extensions, e.g. adjunct_temp)

== PDB-UD-trained model ==
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO_pytorch/combo_PDBUD_full_220906.tar.gz|model]] for part-of-speech tagging, morphological analysis, lemmatisation, and dependency parsing

{{{#!wiki comment
[[https://github.com/360er0/COMBO|COMBO]], [[https://code.google.com/archive/p/mate-tools/|MateParser]] and [[http://maltparser.org|MaltParser]]. /* ''MaltParser'' is a transition-based dependency parser that uses a deterministic parsing algorithm. The deterministic parsing algorithm builds a dependency structure of an input sentence based on transitions (shift-reduce actions) predicted by a classifier. The classifier learns to predict the next transition given training data and the parse history. `MateParser`, in turn, is a graph-based parser that defines a space of well-formed candidate dependency trees for an input sentence, scores them given an induced parsing model, and selects the highest scoring dependency tree as a correct analysis of the input sentence. */

 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO_pytorch/combo_PDB_parseonly_220906.tar.gz|COMBO-pytorch model]] for dependency parsing only (with [[https://huggingface.co/allegro/herbert-base-cased|HerBERT-base]] embeddings),
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/20200930_COMBO_PDB_nosem_parseonly.pkl|COMBO model]] for dependency parsing only
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/20200930_COMBO_PDB_nosem.pkl|COMBO model]] for part-of-speech tagging, lemmatisation, and dependency parsing
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/20200930_COMBO_PDB_sem.pkl|COMBO model]] for part-of-speech tagging, lemmatisation, dependency parsing, and semantic role labelling

 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/191107_COMBO_PDB_semlab_parseonly.pkl|COMBO model]] for (semantic) dependency parsing only

 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/MATE/20190612_MATE_PDB.pkl|MATE model]] for dependency parsing
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/MALT/190125_MALT_PDB.mco|MaltParser model]] for dependency parsing


== PDB-UD-trained dependency parsing models for Polish ==
The PDB-UD-based models are trained on the current version of [[http://git.nlp.ipipan.waw.pl/alina/PDBUD|Polish Dependency Bank in Universal Dependencies format]] with the publicly available parsing systems – [[https://gitlab.clarin-pl.eu/syntactic-tools/combo/-/tree/master|COMBO-pytorch]], [[https://github.com/360er0/COMBO|COMBO]], [[http://ufal.mff.cuni.cz/udpipe|UDPipe]].

 * [[http://mozart.ipipan.waw.pl/~mklimaszewski/models/polish-herbert-base.tar.gz|COMBO-pytorch model]] for for part-of-speech tagging, lemmatisation, and dependency parsing (with [[https://huggingface.co/allegro/herbert-base-cased|HerBERT-base]] embeddings),
 * [[http://mozart.ipipan.waw.pl/~mklimaszewski/models/polish-herbert-large.tar.gz|COMBO-pytorch model]] for for part-of-speech tagging, lemmatisation, and dependency parsing (with [[https://huggingface.co/allegro/herbert-large-cased|HerBERT-large]] embeddings),
 * [[http://mozart.ipipan.waw.pl/~mklimaszewski/models/polish-ud27.tar.gz|COMBO-pytorch model]] for for part-of-speech tagging, lemmatisation, and dependency parsing (with fastText embeddings),
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/20200930_COMBO_PDBUD_nosem.pkl|COMBO model]] for part-of-speech tagging, lemmatisation, and dependency parsing
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/20200930_COMBO_PDBUD_sem.pkl|COMBO model]] for part-of-speech tagging, lemmatisation, dependency parsing, and semantic role labelling
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/UDPIPE/20200930_PDBUD_ttp_embedd.udpipe|UDPipe model]] for tokenisation, part-of-speech tagging, lemmatisation, and dependency parsing
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/UDPIPE/20200930_PDBUD_tokeniser.udpipe|UDPipe model]] for tokenisation}}}

{{{#!wiki comment 
 * [[http://mozart.ipipan.waw.pl/~prybak/model_poleval2018/model_A_semi.pkl|COMBO]] model for Polish (the model estimated for the [[http://poleval.pl/tasks#task1|PolEval 2018]] competition)
 * [[attachment:180606_PDBUDPipe.udpipe|UDPipe]] model for Polish}}}

== Parsing performance (outdated) ==

See [[http://clip.ipipan.waw.pl/benchmarks#Dependency_parsing|Dependency parsing]] section.

{{{#!wiki comment
  * [[attachment:190115_COMBO_PDB_nosem.pkl]] – PDB-based COMBO model for part-of-speech tagging, lemmatisation, and dependency parsing
  * [[attachment: 190115_COMBO_PDB_sem.pkl]] – PDB-based COMBO model for part-of-speech tagging, lemmatisation, dependency parsing and semantic role labelling

 * '''NEW!''' PDB-based COMBO model compatible with the tagset of Morfeusz 2: [[attachment:180912_PDBCOMBO.pkl]]

 * MateParser

  * '''NEW!''' PDB-based Mate model compatible with the tagset of Morfeusz 2: [[attachment:180322_PDBMate.mdl]]
  * PDB-based Mate model compatible with the tagset of Morfeusz: [[attachment:170608_PDBMate.mdl]]

 * MateParser 
  * [[attachment:190125_MATE_PDB.model]] – PDB-based MateParser model for dependency parsing
 * MaltParser
  * [[attachment:190125_MALT_PDB.mco]] – PDB-based MaltParser model for dependency parsing


  * '''NEW!''' PDB-based MaltParser model compatible with the tagset of Morfeusz 2: [[attachment:180322_PDBMalt.mco]]
  * PDB-basd MaltParser model compatible with the tagset of Morfeusz: [[attachment:170608_PDBMalt.mco]]

=== 10-fold cross-validation (avg.) ===

|| '''Model''' || '''LAS''' || '''UAS''' ||
|| `PDBMate` || 0.85 || 0.89 ||
|| `PDBMalt` || 0.82 || 0.86 ||

=== Precision, recall and f-score of individual dependency relations (avg.) ===

The description of Polish dependency relations types is available on [[http://zil.ipipan.waw.pl/PDB/DepRelTypes|Polish dependency relation types]].

||<rowspan=2> '''Dependency relation type'''      |||| '''Precision''' |||| '''Recall'''  |||| '''F-Measure'''  ||
|| Mate || Malt   || Mate || Malt    || Mate || Malt ||
||<bgcolor="#eef3ff">abbrev_punct    ||<bgcolor="#eef3ff">0.99 ||<bgcolor="#eef3ff">0.99   ||<bgcolor="#eef3ff">0.98 ||<bgcolor="#eef3ff">0.97    ||<bgcolor="#eef3ff"> 0.98 ||<bgcolor="#eef3ff">0.98 ||
||adjunct         || 0.89 || 0.73   || 0.92 || 0.77    || 0.82 || 0.75 ||
||<bgcolor="#eef3ff">adjunct_qt      ||<bgcolor="#eef3ff"> 0.74 ||<bgcolor="#eef3ff"> 0.51   ||<bgcolor="#eef3ff"> 0.76 ||<bgcolor="#eef3ff"> 0.58    ||<bgcolor="#eef3ff"> 0.75 ||<bgcolor="#eef3ff"> 0.55 ||
||aglt            || 1.00 || 0.98   || 1.00 || 0.98    || 0.98 || 0.98 ||
||<bgcolor="#eef3ff">app             ||<bgcolor="#eef3ff"> 0.75 ||<bgcolor="#eef3ff"> 0.58   ||<bgcolor="#eef3ff"> 0.69 ||<bgcolor="#eef3ff"> 0.52    ||<bgcolor="#eef3ff"> 0.72 ||<bgcolor="#eef3ff"> 0.55 ||
||aux             || 0.95 || 0.90   || 0.97 || 0.92    || 0.96 || 0.91 ||
||<bgcolor="#eef3ff">comp            ||<bgcolor="#eef3ff"> 0.90 ||<bgcolor="#eef3ff"> 0.85   ||<bgcolor="#eef3ff"> 0.87 ||<bgcolor="#eef3ff"> 0.82    ||<bgcolor="#eef3ff"> 0.88 ||<bgcolor="#eef3ff"> 0.84 ||
||comp_ag         || 0.95 || 0.90   || 0.96 || 0.91    || 0.94 || 0.90 ||
||<bgcolor="#eef3ff">comp_fin        ||<bgcolor="#eef3ff"> 0.87 ||<bgcolor="#eef3ff"> 0.75   ||<bgcolor="#eef3ff"> 0.86 ||<bgcolor="#eef3ff"> 0.79    ||<bgcolor="#eef3ff"> 0.87 ||<bgcolor="#eef3ff"> 0.77 ||
||comp_inf        || 0.95 || 0.91   || 0.96 || 0.90    || 0.93 || 0.90 ||
||<bgcolor="#eef3ff">   cond     ||<bgcolor="#eef3ff"> 1.00 ||<bgcolor="#eef3ff"> 0.97	    ||<bgcolor="#eef3ff"> 1.00 ||<bgcolor="#eef3ff"> 	0.96       ||<bgcolor="#eef3ff"> 1.00 ||<bgcolor="#eef3ff"> 0.96    ||
||conjunct        || 0.85 || 0.71	    || 0.82 || 0.65	       || 0.82 || 0.68     ||
||<bgcolor="#eef3ff"> imp ||<bgcolor="#eef3ff">0.98 ||<bgcolor="#eef3ff">	0.97    ||<bgcolor="#eef3ff"> 0.91 ||<bgcolor="#eef3ff"> 	0.87       ||<bgcolor="#eef3ff">0.94 ||<bgcolor="#eef3ff">  0.92   ||
|| item            || 0.87 ||  0.4	    ||  0.73 ||  0.37	       || 0.61 || 0.39     ||
||<bgcolor="#eef3ff">    mwe    ||<bgcolor="#eef3ff"> 0.90 ||<bgcolor="#eef3ff">	0.83    ||<bgcolor="#eef3ff"> 0.83 ||<bgcolor="#eef3ff"> 0.75	       ||<bgcolor="#eef3ff">0.87 ||<bgcolor="#eef3ff">  0.79   ||
||ne              ||  0.87 || 0.78	    || 0.73 || 0.64	       ||  0.76 ||  0.70     ||
||<bgcolor="#eef3ff">   neg     ||<bgcolor="#eef3ff">0.99 ||<bgcolor="#eef3ff">	 0.97   ||<bgcolor="#eef3ff"> 1.00 ||<bgcolor="#eef3ff"> 0.98	       ||<bgcolor="#eef3ff">0.99  ||<bgcolor="#eef3ff">  0.98   ||
||obj             || 0.89 || 0.81	    || 0.91 || 0.86	       ||  0.89 || 0.83     ||
||<bgcolor="#eef3ff">     obj_th   ||<bgcolor="#eef3ff">0.83 ||<bgcolor="#eef3ff">	0.76    ||<bgcolor="#eef3ff"> 0.76  ||<bgcolor="#eef3ff"> 	0.65       ||<bgcolor="#eef3ff">0.80 ||<bgcolor="#eef3ff">  0.70   ||
||pd              || 0.86 || 0.77	    || 0.80 || 0.72	       || 0.87 || 0.74      ||
||<bgcolor="#eef3ff">  pre_coord      ||<bgcolor="#eef3ff">0.86 ||<bgcolor="#eef3ff">	0.76    ||<bgcolor="#eef3ff">0.78 ||<bgcolor="#eef3ff"> 0.55	       ||<bgcolor="#eef3ff">0.82 ||<bgcolor="#eef3ff">   0.64  ||
||punct           || 0.97 || 0.75	    || 0.98 || 0.76	       || 0.88 || 0.76     ||
||<bgcolor="#eef3ff">refl            ||<bgcolor="#eef3ff"> 0.99 ||<bgcolor="#eef3ff"> 0.96	    ||<bgcolor="#eef3ff"> 0.99 ||<bgcolor="#eef3ff"> 0.96	       ||<bgcolor="#eef3ff"> 0.99 ||<bgcolor="#eef3ff"> 0.96     ||
||root            || 0.91 || 0.80	    || 0.91 || 0.81	       || 0.94 ||0.80     ||
||<bgcolor="#eef3ff">    subj    ||<bgcolor="#eef3ff"> 0.94||<bgcolor="#eef3ff">	   0.84 ||<bgcolor="#eef3ff">0.94 ||<bgcolor="#eef3ff"> 	0.83       ||<bgcolor="#eef3ff">0.94 ||<bgcolor="#eef3ff"> 0.84    ||
}}}

== PDB-based dependency parsing demos ==

 * [[http://combo-demo.nlp.ipipan.waw.pl/combo-eng|English]]
 * [[http://combo-demo.nlp.ipipan.waw.pl/combo-pl|Polish]]

{{{#!wiki comment
 * [[http://scwad-demo.nlp.ipipan.waw.pl:8000/dependency-parsing|COMBO demo]]
 * [[http://multiservice.nlp.ipipan.waw.pl|MaltParser demo in Multiservice NLP]]
  * To parse a Polish text in Multiservice "Select predefined chain of actions": 5: Concraft, !DependencyParser, input your text, and press the button "Run".
  * To download the parser's output in CoNLL format, "Select output format:".}}}

== Publications ==

<<BibMate(key, "kli:wro:2021b", omitYears=true)>>
<<BibMate(key, "wro:ryb:2019", omitYears=true)>> (Note: Please contact the first author to get a copy of this article.)
{{{#!wiki comment 
<<BibMate(key, "wro:14", omitYears=true)>>
<<BibMate(key, "wroblewska:12", omitYears=true)>>}}}


== Licensing ==

The dependency parsing models for Polish are released under the [[https://creativecommons.org/licenses/by-nc-sa/4.0/|CC BY-NC-SA 4.0]] licence and by downloading them you accept the conditions of that licence.

== Acknowledgment ==
The research was founded by SONATA 8 grant no 2014/15/D/HS2/03486 from the National Science Centre Poland and by the Polish Ministry of Science and Higher Education as part of the investment in the CLARIN-PL research infrastructure. The computing was performed at Poznań Supercomputing and Networking Center.


== Contact ==
Any questions, comments? Please send them to <<MailTo(alina AT SPAMFREE ipipan DOT waw DOT pl)>>.

Diff for "PDB/PDBparser"

Menu

Polish COMBO models

PDB-trained models

PDB-UD-trained model

PDB-UD-trained dependency parsing models for Polish

Parsing performance (outdated)

10-fold cross-validation (avg.)

Precision, recall and f-score of individual dependency relations (avg.)

PDB-based dependency parsing demos

Publications

Licensing

Acknowledgment

Contact