#acl AlinaWroblewska:read,write,revert All:read
= COMBO's models for Polish =

[[https://gitlab.clarin-pl.eu/syntactic-tools/combo/-/tree/master|COMBO's]] models for Polish trained on the current version of [[http://zil.ipipan.waw.pl/PDB|Polish Dependency Bank]] using the [[https://huggingface.co/allegro/herbert-base-cased|HerBERT]] language model.

=== PDB-trained models ===
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO_pytorch/combo_PDB_parseonly_220906.tar.gz|model]] for dependency parsing only
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO_pytorch/combo_PDB_full_220906.tar.gz|model]] for part-of-speech tagging, morphological analysis, lemmatisation, and dependency parsing (dependency relation types '''without''' semantic extensions, e.g. adjunct instead of adjunct_temp)
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO_pytorch/combo_PDB_full_SEMLAB_220906.tar.gz|model]] for part-of-speech tagging, morphological analysis, lemmatisation, and dependency parsing (dependency relation types '''with''' semantic extensions, e.g. adjunct_temp)

=== PDB-UD-trained model ===
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO_pytorch/combo_PDBUD_full_220906.tar.gz|model]] for part-of-speech tagging, morphological analysis, lemmatisation, and dependency parsing

{{{#!wiki comment
[[https://github.com/360er0/COMBO|COMBO]], [[https://code.google.com/archive/p/mate-tools/|MateParser]] and [[http://maltparser.org|MaltParser]]. /* ''MaltParser'' is a transition-based dependency parser that uses a deterministic parsing algorithm. The deterministic parsing algorithm builds a dependency structure of an input sentence based on transitions (shift-reduce actions) predicted by a classifier. The classifier learns to predict the next transition given training data and the parse history. `MateParser`, in turn, is a graph-based parser that defines a space of well-formed candidate dependency trees for an input sentence, scores them given an induced parsing model, and selects the highest scoring dependency tree as a correct analysis of the input sentence. */

 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO_pytorch/combo_PDB_parseonly_220906.tar.gz|COMBO-pytorch model]] for dependency parsing only (with [[https://huggingface.co/allegro/herbert-base-cased|HerBERT-base]] embeddings),
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/20200930_COMBO_PDB_nosem_parseonly.pkl|COMBO model]] for dependency parsing only
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/20200930_COMBO_PDB_nosem.pkl|COMBO model]] for part-of-speech tagging, lemmatisation, and dependency parsing
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/20200930_COMBO_PDB_sem.pkl|COMBO model]] for part-of-speech tagging, lemmatisation, dependency parsing, and semantic role labelling

 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/191107_COMBO_PDB_semlab_parseonly.pkl|COMBO model]] for (semantic) dependency parsing only

 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/MATE/20190612_MATE_PDB.pkl|MATE model]] for dependency parsing
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/MALT/190125_MALT_PDB.mco|MaltParser model]] for dependency parsing


== PDB-UD-trained dependency parsing models for Polish ==
The PDB-UD-based models are trained on the current version of [[http://git.nlp.ipipan.waw.pl/alina/PDBUD|Polish Dependency Bank in Universal Dependencies format]] with the publicly available parsing systems – [[https://gitlab.clarin-pl.eu/syntactic-tools/combo/-/tree/master|COMBO-pytorch]], [[https://github.com/360er0/COMBO|COMBO]], [[http://ufal.mff.cuni.cz/udpipe|UDPipe]].

 * [[http://mozart.ipipan.waw.pl/~mklimaszewski/models/polish-herbert-base.tar.gz|COMBO-pytorch model]] for for part-of-speech tagging, lemmatisation, and dependency parsing (with [[https://huggingface.co/allegro/herbert-base-cased|HerBERT-base]] embeddings),
 * [[http://mozart.ipipan.waw.pl/~mklimaszewski/models/polish-herbert-large.tar.gz|COMBO-pytorch model]] for for part-of-speech tagging, lemmatisation, and dependency parsing (with [[https://huggingface.co/allegro/herbert-large-cased|HerBERT-large]] embeddings),
 * [[http://mozart.ipipan.waw.pl/~mklimaszewski/models/polish-ud27.tar.gz|COMBO-pytorch model]] for for part-of-speech tagging, lemmatisation, and dependency parsing (with fastText embeddings),
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/20200930_COMBO_PDBUD_nosem.pkl|COMBO model]] for part-of-speech tagging, lemmatisation, and dependency parsing
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/20200930_COMBO_PDBUD_sem.pkl|COMBO model]] for part-of-speech tagging, lemmatisation, dependency parsing, and semantic role labelling
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/UDPIPE/20200930_PDBUD_ttp_embedd.udpipe|UDPipe model]] for tokenisation, part-of-speech tagging, lemmatisation, and dependency parsing
 * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/UDPIPE/20200930_PDBUD_tokeniser.udpipe|UDPipe model]] for tokenisation}}}

{{{#!wiki comment 
 * [[http://mozart.ipipan.waw.pl/~prybak/model_poleval2018/model_A_semi.pkl|COMBO]] model for Polish (the model estimated for the [[http://poleval.pl/tasks#task1|PolEval 2018]] competition)
 * [[attachment:180606_PDBUDPipe.udpipe|UDPipe]] model for Polish}}}

=== COMBO ===

 * COMBO's [[https://gitlab.clarin-pl.eu/syntactic-tools/combo/-/tree/master|source code]]
 * Beginner's [[https://colab.research.google.com/drive/1D1P4AiE40Cc_4SF3HY-Mz06JY0XMiEFs?hl=en|tutorial]] (collab notebook)
 * COMBO's [[https://gitlab.clarin-pl.eu/syntactic-tools/combo/-/blob/master/docs/performance.md|performance]] on test sets for multiple languages from [[https://universaldependencies.org|Universal Dependencies]]
 * Web demos
  * [[http://combo-demo.nlp.ipipan.waw.pl/combo-eng|English]]
  * [[http://combo-demo.nlp.ipipan.waw.pl/combo-pl|Polish]]
{{{#!wiki comment
=== Parsing performance ===

See [[http://clip.ipipan.waw.pl/benchmarks#Dependency_parsing|Dependency parsing]] section.


  * [[attachment:190115_COMBO_PDB_nosem.pkl]] – PDB-based COMBO model for part-of-speech tagging, lemmatisation, and dependency parsing
  * [[attachment: 190115_COMBO_PDB_sem.pkl]] – PDB-based COMBO model for part-of-speech tagging, lemmatisation, dependency parsing and semantic role labelling

 * '''NEW!''' PDB-based COMBO model compatible with the tagset of Morfeusz 2: [[attachment:180912_PDBCOMBO.pkl]]

 * MateParser

  * '''NEW!''' PDB-based Mate model compatible with the tagset of Morfeusz 2: [[attachment:180322_PDBMate.mdl]]
  * PDB-based Mate model compatible with the tagset of Morfeusz: [[attachment:170608_PDBMate.mdl]]

 * MateParser 
  * [[attachment:190125_MATE_PDB.model]] – PDB-based MateParser model for dependency parsing
 * MaltParser
  * [[attachment:190125_MALT_PDB.mco]] – PDB-based MaltParser model for dependency parsing


  * '''NEW!''' PDB-based MaltParser model compatible with the tagset of Morfeusz 2: [[attachment:180322_PDBMalt.mco]]
  * PDB-basd MaltParser model compatible with the tagset of Morfeusz: [[attachment:170608_PDBMalt.mco]]

=== 10-fold cross-validation (avg.) ===

|| '''Model''' || '''LAS''' || '''UAS''' ||
|| `PDBMate` || 0.85 || 0.89 ||
|| `PDBMalt` || 0.82 || 0.86 ||

=== Precision, recall and f-score of individual dependency relations (avg.) ===

The description of Polish dependency relations types is available on [[http://zil.ipipan.waw.pl/PDB/DepRelTypes|Polish dependency relation types]].

||<rowspan=2> '''Dependency relation type'''      |||| '''Precision''' |||| '''Recall'''  |||| '''F-Measure'''  ||
|| Mate || Malt   || Mate || Malt    || Mate || Malt ||
||<bgcolor="#eef3ff">abbrev_punct    ||<bgcolor="#eef3ff">0.99 ||<bgcolor="#eef3ff">0.99   ||<bgcolor="#eef3ff">0.98 ||<bgcolor="#eef3ff">0.97    ||<bgcolor="#eef3ff"> 0.98 ||<bgcolor="#eef3ff">0.98 ||
||adjunct         || 0.89 || 0.73   || 0.92 || 0.77    || 0.82 || 0.75 ||
||<bgcolor="#eef3ff">adjunct_qt      ||<bgcolor="#eef3ff"> 0.74 ||<bgcolor="#eef3ff"> 0.51   ||<bgcolor="#eef3ff"> 0.76 ||<bgcolor="#eef3ff"> 0.58    ||<bgcolor="#eef3ff"> 0.75 ||<bgcolor="#eef3ff"> 0.55 ||
||aglt            || 1.00 || 0.98   || 1.00 || 0.98    || 0.98 || 0.98 ||
||<bgcolor="#eef3ff">app             ||<bgcolor="#eef3ff"> 0.75 ||<bgcolor="#eef3ff"> 0.58   ||<bgcolor="#eef3ff"> 0.69 ||<bgcolor="#eef3ff"> 0.52    ||<bgcolor="#eef3ff"> 0.72 ||<bgcolor="#eef3ff"> 0.55 ||
||aux             || 0.95 || 0.90   || 0.97 || 0.92    || 0.96 || 0.91 ||
||<bgcolor="#eef3ff">comp            ||<bgcolor="#eef3ff"> 0.90 ||<bgcolor="#eef3ff"> 0.85   ||<bgcolor="#eef3ff"> 0.87 ||<bgcolor="#eef3ff"> 0.82    ||<bgcolor="#eef3ff"> 0.88 ||<bgcolor="#eef3ff"> 0.84 ||
||comp_ag         || 0.95 || 0.90   || 0.96 || 0.91    || 0.94 || 0.90 ||
||<bgcolor="#eef3ff">comp_fin        ||<bgcolor="#eef3ff"> 0.87 ||<bgcolor="#eef3ff"> 0.75   ||<bgcolor="#eef3ff"> 0.86 ||<bgcolor="#eef3ff"> 0.79    ||<bgcolor="#eef3ff"> 0.87 ||<bgcolor="#eef3ff"> 0.77 ||
||comp_inf        || 0.95 || 0.91   || 0.96 || 0.90    || 0.93 || 0.90 ||
||<bgcolor="#eef3ff">   cond     ||<bgcolor="#eef3ff"> 1.00 ||<bgcolor="#eef3ff"> 0.97	    ||<bgcolor="#eef3ff"> 1.00 ||<bgcolor="#eef3ff"> 	0.96       ||<bgcolor="#eef3ff"> 1.00 ||<bgcolor="#eef3ff"> 0.96    ||
||conjunct        || 0.85 || 0.71	    || 0.82 || 0.65	       || 0.82 || 0.68     ||
||<bgcolor="#eef3ff"> imp ||<bgcolor="#eef3ff">0.98 ||<bgcolor="#eef3ff">	0.97    ||<bgcolor="#eef3ff"> 0.91 ||<bgcolor="#eef3ff"> 	0.87       ||<bgcolor="#eef3ff">0.94 ||<bgcolor="#eef3ff">  0.92   ||
|| item            || 0.87 ||  0.4	    ||  0.73 ||  0.37	       || 0.61 || 0.39     ||
||<bgcolor="#eef3ff">    mwe    ||<bgcolor="#eef3ff"> 0.90 ||<bgcolor="#eef3ff">	0.83    ||<bgcolor="#eef3ff"> 0.83 ||<bgcolor="#eef3ff"> 0.75	       ||<bgcolor="#eef3ff">0.87 ||<bgcolor="#eef3ff">  0.79   ||
||ne              ||  0.87 || 0.78	    || 0.73 || 0.64	       ||  0.76 ||  0.70     ||
||<bgcolor="#eef3ff">   neg     ||<bgcolor="#eef3ff">0.99 ||<bgcolor="#eef3ff">	 0.97   ||<bgcolor="#eef3ff"> 1.00 ||<bgcolor="#eef3ff"> 0.98	       ||<bgcolor="#eef3ff">0.99  ||<bgcolor="#eef3ff">  0.98   ||
||obj             || 0.89 || 0.81	    || 0.91 || 0.86	       ||  0.89 || 0.83     ||
||<bgcolor="#eef3ff">     obj_th   ||<bgcolor="#eef3ff">0.83 ||<bgcolor="#eef3ff">	0.76    ||<bgcolor="#eef3ff"> 0.76  ||<bgcolor="#eef3ff"> 	0.65       ||<bgcolor="#eef3ff">0.80 ||<bgcolor="#eef3ff">  0.70   ||
||pd              || 0.86 || 0.77	    || 0.80 || 0.72	       || 0.87 || 0.74      ||
||<bgcolor="#eef3ff">  pre_coord      ||<bgcolor="#eef3ff">0.86 ||<bgcolor="#eef3ff">	0.76    ||<bgcolor="#eef3ff">0.78 ||<bgcolor="#eef3ff"> 0.55	       ||<bgcolor="#eef3ff">0.82 ||<bgcolor="#eef3ff">   0.64  ||
||punct           || 0.97 || 0.75	    || 0.98 || 0.76	       || 0.88 || 0.76     ||
||<bgcolor="#eef3ff">refl            ||<bgcolor="#eef3ff"> 0.99 ||<bgcolor="#eef3ff"> 0.96	    ||<bgcolor="#eef3ff"> 0.99 ||<bgcolor="#eef3ff"> 0.96	       ||<bgcolor="#eef3ff"> 0.99 ||<bgcolor="#eef3ff"> 0.96     ||
||root            || 0.91 || 0.80	    || 0.91 || 0.81	       || 0.94 ||0.80     ||
||<bgcolor="#eef3ff">    subj    ||<bgcolor="#eef3ff"> 0.94||<bgcolor="#eef3ff">	   0.84 ||<bgcolor="#eef3ff">0.94 ||<bgcolor="#eef3ff"> 	0.83       ||<bgcolor="#eef3ff">0.94 ||<bgcolor="#eef3ff"> 0.84    ||

=== COMBO demos ===

 * [[http://combo-demo.nlp.ipipan.waw.pl/combo-eng|English]]
 * [[http://combo-demo.nlp.ipipan.waw.pl/combo-pl|Polish]]

 * [[http://scwad-demo.nlp.ipipan.waw.pl:8000/dependency-parsing|COMBO demo]]
 * [[http://multiservice.nlp.ipipan.waw.pl|MaltParser demo in Multiservice NLP]]
  * To parse a Polish text in Multiservice "Select predefined chain of actions": 5: Concraft, !DependencyParser, input your text, and press the button "Run".
  * To download the parser's output in CoNLL format, "Select output format:".}}}

=== Publications ===

<<BibMate(key, "kli:wro:2021b", omitYears=true)>>
<<BibMate(key, "wro:ryb:2019", omitYears=true)>>
<<BibMate(key, "ryb:wro:018a", omitYears=true)>>
{{{#!wiki comment 
<<BibMate(key, "wro:14", omitYears=true)>>
<<BibMate(key, "wroblewska:12", omitYears=true)>>}}}


=== Licensing ===

Polish NLP models are released under the [[https://creativecommons.org/licenses/by-nc-sa/4.0/|CC BY-NC-SA 4.0]] licence and by downloading them you accept the conditions of that licence.

=== Acknowledgment ===
The research was founded by SONATA 8 grant no 2014/15/D/HS2/03486 from the National Science Centre Poland and by the Polish Ministry of Science, Higher Education as part of the investment in the CLARIN-PL research infrastructure and by Digital Research Infrastructure for the Arts and Humanities DARIAH-PL. The computing was performed at Poznań Supercomputing and Networking Center.


=== Contact ===
Any questions, comments? Please send them to <<MailTo(alina AT SPAMFREE ipipan DOT waw DOT pl)>>.