Diff for "PDB/PDBparser"

Differences between revisions 59 and 77 (spanning 18 versions)

PDB-trained dependency parsing models for Polish

The PDB-based models are trained on the current version of Polish Dependency Bank with the publicly available parsing systems – COMBO, MateParser and MaltParser.

COMBO model for dependency parsing only
COMBO model for part-of-speech tagging, lemmatisation, and dependency parsing
COMBO model for part-of-speech tagging, lemmatisation, dependency parsing, and semantic role labelling

MATE model for dependency parsing
MaltParser model for dependency parsing

PDB-UD-trained dependency parsing models for Polish

The PDB-UD-based models are trained on the current version of Polish Dependency Bank in Universal Dependencies format with the publicly available parsing systems – UDPipe and COMBO.

COMBO model for part-of-speech tagging, lemmatisation, and dependency parsing
COMBO model for part-of-speech tagging, lemmatisation, dependency parsing, and semantic role labelling
UDPipe model for tokenisation, part-of-speech tagging, lemmatisation, and dependency parsing
UDPipe model for tokenisation

Parsing performance

See Dependency parsing section.

190115_COMBO_PDB_nosem.pkl – PDB-based COMBO model for part-of-speech tagging, lemmatisation, and dependency parsing
190115_COMBO_PDB_sem.pkl – PDB-based COMBO model for part-of-speech tagging, lemmatisation, dependency parsing and semantic role labelling

NEW! PDB-based COMBO model compatible with the tagset of Morfeusz 2: 180912_PDBCOMBO.pkl
MateParser
- NEW! PDB-based Mate model compatible with the tagset of Morfeusz 2: 180322_PDBMate.mdl
- PDB-based Mate model compatible with the tagset of Morfeusz: 170608_PDBMate.mdl
MateParser
- 190125_MATE_PDB.model – PDB-based MateParser model for dependency parsing
MaltParser
- 190125_MALT_PDB.mco – PDB-based MaltParser model for dependency parsing
- NEW! PDB-based MaltParser model compatible with the tagset of Morfeusz 2: 180322_PDBMalt.mco
- PDB-basd MaltParser model compatible with the tagset of Morfeusz: 170608_PDBMalt.mco

10-fold cross-validation (avg.)

Model	LAS	UAS
`PDBMate`	0.85	0.89
`PDBMalt`	0.82	0.86

Precision, recall and f-score of individual dependency relations (avg.)

The description of Polish dependency relations types is available on Polish dependency relation types.

Dependency relation type	Precision		Recall		F-Measure
Dependency relation type	Mate	Malt	Mate	Malt	Mate	Malt
abbrev_punct	0.99	0.99	0.98	0.97	0.98	0.98
adjunct	0.89	0.73	0.92	0.77	0.82	0.75
adjunct_qt	0.74	0.51	0.76	0.58	0.75	0.55
aglt	1.00	0.98	1.00	0.98	0.98	0.98
app	0.75	0.58	0.69	0.52	0.72	0.55
aux	0.95	0.90	0.97	0.92	0.96	0.91
comp	0.90	0.85	0.87	0.82	0.88	0.84
comp_ag	0.95	0.90	0.96	0.91	0.94	0.90
comp_fin	0.87	0.75	0.86	0.79	0.87	0.77
comp_inf	0.95	0.91	0.96	0.90	0.93	0.90
cond	1.00	0.97	1.00	0.96	1.00	0.96
conjunct	0.85	0.71	0.82	0.65	0.82	0.68
imp	0.98	0.97	0.91	0.87	0.94	0.92
item	0.87	0.4	0.73	0.37	0.61	0.39
mwe	0.90	0.83	0.83	0.75	0.87	0.79
ne	0.87	0.78	0.73	0.64	0.76	0.70
neg	0.99	0.97	1.00	0.98	0.99	0.98
obj	0.89	0.81	0.91	0.86	0.89	0.83
obj_th	0.83	0.76	0.76	0.65	0.80	0.70
pd	0.86	0.77	0.80	0.72	0.87	0.74
pre_coord	0.86	0.76	0.78	0.55	0.82	0.64
punct	0.97	0.75	0.98	0.76	0.88	0.76
refl	0.99	0.96	0.99	0.96	0.99	0.96
root	0.91	0.80	0.91	0.81	0.94	0.80
subj	0.94	0.84	0.94	0.83	0.94	0.84

PDB-based MaltParser in Multiservice

The performance of MaltParser model for Polish may be tested in Multiservice NLP – http://multiservice.nlp.ipipan.waw.pl.
To parse a Polish text in Multiservice "Select predefined chain of actions": 5: Concraft, DependencyParser, input your text, and press the button "Run".
To download the parser's output in CoNLL format, "Select output format:".

Publications

List of publications

Alina Wróblewska and Piotr Rybak. Dependency parsing of Polish. Poznań Studies in Contemporary Linguistics, 55(2):305–337, 2019.

(Note: Please contact the first author to get a copy of this article.) List of publications

Alina Wróblewska. Polish Dependency Parser Trained on an Automatically Induced Dependency Bank. Ph.D. dissertation, Institute of Computer Science, Polish Academy of Sciences, Warsaw, 2014.

List of publications

Alina Wróblewska and Adam Przepiórkowski. Projection-based annotation of a Polish dependency treebank. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC 2014, pages 2306–2312, Reykjavík, Iceland, 2014. European Language Resources Association (ELRA).

List of publications

Alina Wróblewska. Polish dependency bank. Linguistic Issues in Language Technology, 7(1), 2012.

List of publications

Alina Wróblewska and Marcin Woliński. Preliminary experiments in Polish dependency parsing. In Pascal Bouvry, Mieczysław A. Kłopotek, Franck Leprevost, Małgorzata Marciniak, Agnieszka Mykowiecka, and Henryk Rybiński, editors, Security and Intelligent Information Systems: International Joint Conference, SIIS 2011, Warsaw, Poland, June 13-14, 2011, Revised Selected Papers, number 7053 in Lecture Notes in Computer Science, pages 279–292. Springer-Verlag, 2012.

Licensing

The dependency parsing models for Polish are released under the CC BY-NC-SA 4.0 licence and by downloading them you accept the conditions of that licence.

Acknowledgment

The research was founded by SONATA 8 grant no 2014/15/D/HS2/03486 from the National Science Centre Poland and by the Polish Ministry of Science and Higher Education as part of the investment in the CLARIN-PL research infrastructure. The computing was performed at Poznań Supercomputing and Networking Center.

Contact

Any questions, comments? Please send them to <alina AT SPAMFREE ipipan DOT waw DOT pl>.

-  ⇤ ← Revision 59 as of 2020-01-17 12:19:49 → 
  Size: 10180
  Editor: AlinaWroblewska
  Comment:
+   ← Revision 77 as of 2020-10-06 14:35:45 → ⇥
  Size: 10441
  Editor: AlinaWroblewska
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 6:
- * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/191107_COMBO_PDB_semlab_parseonly.pkl|COMBO model]] for dependency parsing only
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/190423_COMBO_PDB_nosem_parseonly.pkl|COMBO model]] for (semantic) dependency parsing only
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/190423_COMBO_PDB_nosem.pkl|COMBO model]] for part-of-speech tagging, lemmatisation, and dependency parsing
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/190423_COMBO_PDB_sem.pkl|COMBO model]] for part-of-speech tagging, lemmatisation, dependency parsing, and semantic role labelling
+ * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/200128_COMBO_PDB_nosem_parseonly.pkl|COMBO model]] for dependency parsing only
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/200118_COMBO_PDB_nosem_full.pkl|COMBO model]] for part-of-speech tagging, lemmatisation, and dependency parsing
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/20200930_COMBO_PDBUD_sem.pkl|COMBO model]] for part-of-speech tagging, lemmatisation, dependency parsing, and semantic role labelling
{{{#!wiki comment
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/191107_COMBO_PDB_semlab_parseonly.pkl|COMBO model]] for (semantic) dependency parsing only}}}
-Line 11:
+Line 12:
- * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/20190612_MATE_PDB.pkl|MATE model]] for dependency parsing
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/190125_MALT_PDB.mco|MaltParser model]] for dependency parsing
+ * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/MATE/20190612_MATE_PDB.pkl|MATE model]] for dependency parsing
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/MALT/190125_MALT_PDB.mco|MaltParser model]] for dependency parsing
-Line 18:
+Line 19:
- * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/190423_COMBO_PDBUD_nosem.pkl|COMBO model]] for part-of-speech tagging, lemmatisation, and dependency parsing
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/190423_COMBO_PDBUD_sem.pkl|COMBO model]] for part-of-speech tagging, lemmatisation, dependency parsing, and semantic role labelling
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/190423_PDBUD_ttp_embedd.udpipe|UDPipe model]] for tokenisation, part-of-speech tagging, lemmatisation, and dependency parsing
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/190423_PDBUD_tokeniser.udpipe|UDPipe model]] for tokenisation
+ * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/200118_COMBO_PDBUD_nosem_full.pkl|COMBO model]] for part-of-speech tagging, lemmatisation, and dependency parsing
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/200118_COMBO_PDBUD_sem_full.pkl|COMBO model]] for part-of-speech tagging, lemmatisation, dependency parsing, and semantic role labelling
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/UDPIPE/20200930_PDBUD_ttp_embedd.udpipe|UDPipe model]] for tokenisation, part-of-speech tagging, lemmatisation, and dependency parsing
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/UDPIPE/20200930_PDBUD_tokeniser.udpipe|UDPipe model]] for tokenisation
-Line 93:
+Line 94:
- * To download the parser's output in CoNLL format, "Select output format:":
+ * To download the parser's output in CoNLL format, "Select output format:".
-Line 96:
+Line 97:
-<<BibMate(key, "wro_ryb_2019", omitYears=true)>>
+<<BibMate(key, "wro:ryb:2019", omitYears=true)>> (Note: Please contact the first author to get a copy of this article.)
-Line 100:
+Line 102:
-<<BibMate(key, "awmw:departing", omitYears=true)>>
+<<BibMate(key, "awmw:deparsing", omitYears=true)>>
-Line 105:
+Line 107:
-The dependency parsing models for Polish are released under the [[https://creativecommons.org/licenses/by-nc-sa/4.0/|CC BY-NC-SA 4.0]] licence and by downloading it you accept the conditions of that licence.
+The dependency parsing models for Polish are released under the [[https://creativecommons.org/licenses/by-nc-sa/4.0/|CC BY-NC-SA 4.0]] licence and by downloading them you accept the conditions of that licence.
-Line 107:
+Line 109:
-== Founding ==
The research was founded by SONATA 8 grant no 2014/15/D/HS2/03486 from the National Science Centre Poland and by the Polish Ministry of Science and Higher Education as part of the investment in the CLARIN-PL research infrastructure.
+== Acknowledgment ==
The research was founded by SONATA 8 grant no 2014/15/D/HS2/03486 from the National Science Centre Poland and by the Polish Ministry of Science and Higher Education as part of the investment in the CLARIN-PL research infrastructure. The computing was performed at Poznań Supercomputing and Networking Center.

Diff for "PDB/PDBparser"

Menu

PDB-trained dependency parsing models for Polish

PDB-UD-trained dependency parsing models for Polish

Parsing performance

10-fold cross-validation (avg.)

Precision, recall and f-score of individual dependency relations (avg.)

PDB-based MaltParser in Multiservice

Publications

Licensing

Acknowledgment

Contact