Size: 10580
Comment:
|
← Revision 109 as of 2022-09-09 07:00:19 ⇥
Size: 13257
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 2: | Line 2: |
== PDB-trained dependency parsing models for Polish == | = COMBO's models for Polish = |
Line 4: | Line 4: |
The PDB-based models are trained on the current version of [[http://zil.ipipan.waw.pl/PDB|Polish Dependency Bank]] with the publicly available parsing systems – [[https://github.com/360er0/COMBO|COMBO]], [[https://code.google.com/archive/p/mate-tools/|MateParser]] and [[http://maltparser.org|MaltParser]]. /* ''MaltParser'' is a transition-based dependency parser that uses a deterministic parsing algorithm. The deterministic parsing algorithm builds a dependency structure of an input sentence based on transitions (shift-reduce actions) predicted by a classifier. The classifier learns to predict the next transition given training data and the parse history. `MateParser`, in turn, is a graph-based parser that defines a space of well-formed candidate dependency trees for an input sentence, scores them given an induced parsing model, and selects the highest scoring dependency tree as a correct analysis of the input sentence. */ | [[https://gitlab.clarin-pl.eu/syntactic-tools/combo/-/tree/master|COMBO's]] models for Polish trained on the current version of [[http://zil.ipipan.waw.pl/PDB|Polish Dependency Bank]] and using the [[https://huggingface.co/allegro/herbert-base-cased|HerBERT]] language model. |
Line 6: | Line 6: |
* [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/200128_COMBO_PDB_nosem_parseonly.pkl|COMBO model]] for dependency parsing only * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/200118_COMBO_PDB_nosem_full.pkl|COMBO model]] for part-of-speech tagging, lemmatisation, and dependency parsing * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/20200930_COMBO_PDBUD_sem.pkl|COMBO model]] for part-of-speech tagging, lemmatisation, dependency parsing, and semantic role labelling |
=== PDB-trained models === * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO_pytorch/combo_PDB_parseonly_220906.tar.gz|model]] for dependency parsing only * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO_pytorch/combo_PDB_full_220906.tar.gz|model]] for part-of-speech tagging, morphological analysis, lemmatisation, and dependency parsing (dependency relation types '''without''' semantic extensions, e.g. adjunct instead of adjunct_temp) * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO_pytorch/combo_PDB_full_SEMLAB_220906.tar.gz|model]] for part-of-speech tagging, morphological analysis, lemmatisation, and dependency parsing (dependency relation types '''with''' semantic extensions, e.g. adjunct_temp) === PDB-UD-trained model === * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO_pytorch/combo_PDBUD_full_220906.tar.gz|model]] for part-of-speech tagging, morphological analysis, lemmatisation, and dependency parsing |
Line 10: | Line 15: |
* [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/191107_COMBO_PDB_semlab_parseonly.pkl|COMBO model]] for (semantic) dependency parsing only}}} | [[https://github.com/360er0/COMBO|COMBO]], [[https://code.google.com/archive/p/mate-tools/|MateParser]] and [[http://maltparser.org|MaltParser]]. /* ''MaltParser'' is a transition-based dependency parser that uses a deterministic parsing algorithm. The deterministic parsing algorithm builds a dependency structure of an input sentence based on transitions (shift-reduce actions) predicted by a classifier. The classifier learns to predict the next transition given training data and the parse history. `MateParser`, in turn, is a graph-based parser that defines a space of well-formed candidate dependency trees for an input sentence, scores them given an induced parsing model, and selects the highest scoring dependency tree as a correct analysis of the input sentence. */ * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO_pytorch/combo_PDB_parseonly_220906.tar.gz|COMBO-pytorch model]] for dependency parsing only (with [[https://huggingface.co/allegro/herbert-base-cased|HerBERT-base]] embeddings), * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/20200930_COMBO_PDB_nosem_parseonly.pkl|COMBO model]] for dependency parsing only * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/20200930_COMBO_PDB_nosem.pkl|COMBO model]] for part-of-speech tagging, lemmatisation, and dependency parsing * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/20200930_COMBO_PDB_sem.pkl|COMBO model]] for part-of-speech tagging, lemmatisation, dependency parsing, and semantic role labelling * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/191107_COMBO_PDB_semlab_parseonly.pkl|COMBO model]] for (semantic) dependency parsing only |
Line 17: | Line 29: |
The PDB-UD-based models are trained on the current version of [[http://git.nlp.ipipan.waw.pl/alina/PDBUD|Polish Dependency Bank in Universal Dependencies format]] with the publicly available parsing systems – [[http://ufal.mff.cuni.cz/udpipe|UDPipe]] and [[https://github.com/360er0/COMBO|COMBO]]. | The PDB-UD-based models are trained on the current version of [[http://git.nlp.ipipan.waw.pl/alina/PDBUD|Polish Dependency Bank in Universal Dependencies format]] with the publicly available parsing systems – [[https://gitlab.clarin-pl.eu/syntactic-tools/combo/-/tree/master|COMBO-pytorch]], [[https://github.com/360er0/COMBO|COMBO]], [[http://ufal.mff.cuni.cz/udpipe|UDPipe]]. |
Line 19: | Line 31: |
* [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/200118_COMBO_PDBUD_nosem_full.pkl|COMBO model]] for part-of-speech tagging, lemmatisation, and dependency parsing * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/200118_COMBO_PDBUD_sem_full.pkl|COMBO model]] for part-of-speech tagging, lemmatisation, dependency parsing, and semantic role labelling |
* [[http://mozart.ipipan.waw.pl/~mklimaszewski/models/polish-herbert-base.tar.gz|COMBO-pytorch model]] for for part-of-speech tagging, lemmatisation, and dependency parsing (with [[https://huggingface.co/allegro/herbert-base-cased|HerBERT-base]] embeddings), * [[http://mozart.ipipan.waw.pl/~mklimaszewski/models/polish-herbert-large.tar.gz|COMBO-pytorch model]] for for part-of-speech tagging, lemmatisation, and dependency parsing (with [[https://huggingface.co/allegro/herbert-large-cased|HerBERT-large]] embeddings), * [[http://mozart.ipipan.waw.pl/~mklimaszewski/models/polish-ud27.tar.gz|COMBO-pytorch model]] for for part-of-speech tagging, lemmatisation, and dependency parsing (with fastText embeddings), * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/20200930_COMBO_PDBUD_nosem.pkl|COMBO model]] for part-of-speech tagging, lemmatisation, and dependency parsing * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/20200930_COMBO_PDBUD_sem.pkl|COMBO model]] for part-of-speech tagging, lemmatisation, dependency parsing, and semantic role labelling |
Line 22: | Line 37: |
* [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/UDPIPE/20200930_PDBUD_tokeniser.udpipe|UDPipe model]] for tokenisation | * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/UDPIPE/20200930_PDBUD_tokeniser.udpipe|UDPipe model]] for tokenisation}}} |
Line 28: | Line 43: |
== Parsing performance == | === COMBO === |
Line 30: | Line 45: |
See [[http://clip.ipipan.waw.pl/benchmarks|Dependency parsing]] section. | * COMBO's [[https://gitlab.clarin-pl.eu/syntactic-tools/combo/-/tree/master|source code]] * Beginner's [[https://colab.research.google.com/drive/1D1P4AiE40Cc_4SF3HY-Mz06JY0XMiEFs?hl=en|tutorial]] (collab notebook) * COMBO's [[https://gitlab.clarin-pl.eu/syntactic-tools/combo/-/blob/master/docs/performance.md|performance]] on test sets for multiple languages from [[https://universaldependencies.org|Universal Dependencies]] |
Line 33: | Line 50: |
=== Parsing performance === See [[http://clip.ipipan.waw.pl/benchmarks#Dependency_parsing|Dependency parsing]] section. |
|
Line 91: | Line 113: |
== PDB-based COMBO demo == * [[http://scwad-demo.nlp.ipipan.waw.pl:8000/dependency-parsing|Dependency parsing demo]] (only in Polish) |
=== COMBO demos === |
Line 94: | Line 115: |
== PDB-based MaltParser in Multiservice == * The performance of !MaltParser model for Polish may be tested in Multiservice NLP – [[http://multiservice.nlp.ipipan.waw.pl]]. * To parse a Polish text in Multiservice "Select predefined chain of actions": 5: Concraft, !DependencyParser, input your text, and press the button "Run". * To download the parser's output in CoNLL format, "Select output format:". |
* [[http://combo-demo.nlp.ipipan.waw.pl/combo-eng|English]] * [[http://combo-demo.nlp.ipipan.waw.pl/combo-pl|Polish]] |
Line 99: | Line 118: |
== Publications == | {{{#!wiki comment * [[http://scwad-demo.nlp.ipipan.waw.pl:8000/dependency-parsing|COMBO demo]] * [[http://multiservice.nlp.ipipan.waw.pl|MaltParser demo in Multiservice NLP]] * To parse a Polish text in Multiservice "Select predefined chain of actions": 5: Concraft, !DependencyParser, input your text, and press the button "Run". * To download the parser's output in CoNLL format, "Select output format:".}}} |
Line 101: | Line 124: |
=== Publications === <<BibMate(key, "kli:wro:2021b", omitYears=true)>> |
|
Line 102: | Line 128: |
{{{#!wiki comment | |
Line 103: | Line 130: |
<<BibMate(key, "wro:prz:14", omitYears=true)>> <<BibMate(key, "wroblewska:12", omitYears=true)>> <<BibMate(key, "awmw:deparsing", omitYears=true)>> |
<<BibMate(key, "wroblewska:12", omitYears=true)>>}}} |
Line 108: | Line 133: |
== Licensing == | === Licensing === |
Line 112: | Line 137: |
== Acknowledgment == The research was founded by SONATA 8 grant no 2014/15/D/HS2/03486 from the National Science Centre Poland and by the Polish Ministry of Science and Higher Education as part of the investment in the CLARIN-PL research infrastructure. The computing was performed at Poznań Supercomputing and Networking Center. |
=== Acknowledgment === The research was founded by SONATA 8 grant no 2014/15/D/HS2/03486 from the National Science Centre Poland and by the Polish Ministry of Science, Higher Education as part of the investment in the CLARIN-PL research infrastructure and DARIAH-PL. The computing was performed at Poznań Supercomputing and Networking Center. |
Line 116: | Line 141: |
== Contact == | === Contact === |
COMBO's models for Polish
COMBO's models for Polish trained on the current version of Polish Dependency Bank and using the HerBERT language model.
PDB-trained models
model for dependency parsing only
model for part-of-speech tagging, morphological analysis, lemmatisation, and dependency parsing (dependency relation types without semantic extensions, e.g. adjunct instead of adjunct_temp)
model for part-of-speech tagging, morphological analysis, lemmatisation, and dependency parsing (dependency relation types with semantic extensions, e.g. adjunct_temp)
PDB-UD-trained model
model for part-of-speech tagging, morphological analysis, lemmatisation, and dependency parsing
COMBO
COMBO's source code
Beginner's tutorial (collab notebook)
COMBO's performance on test sets for multiple languages from Universal Dependencies
COMBO demos
Publications
Licensing
The dependency parsing models for Polish are released under the CC BY-NC-SA 4.0 licence and by downloading them you accept the conditions of that licence.
Acknowledgment
The research was founded by SONATA 8 grant no 2014/15/D/HS2/03486 from the National Science Centre Poland and by the Polish Ministry of Science, Higher Education as part of the investment in the CLARIN-PL research infrastructure and DARIAH-PL. The computing was performed at Poznań Supercomputing and Networking Center.
Contact
Any questions, comments? Please send them to <alina AT SPAMFREE ipipan DOT waw DOT pl>.