Size: 10402
Comment:
|
Size: 11628
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 4: | Line 4: |
The PDB-based models are trained on the current version of [[http://zil.ipipan.waw.pl/PDB|Polish Dependency Bank]] with the publicly available parsing systems – [[https://github.com/360er0/COMBO|COMBO]], [[https://code.google.com/archive/p/mate-tools/|MateParser]] and [[http://maltparser.org|MaltParser]]. /* ''MaltParser'' is a transition-based dependency parser that uses a deterministic parsing algorithm. The deterministic parsing algorithm builds a dependency structure of an input sentence based on transitions (shift-reduce actions) predicted by a classifier. The classifier learns to predict the next transition given training data and the parse history. `MateParser`, in turn, is a graph-based parser that defines a space of well-formed candidate dependency trees for an input sentence, scores them given an induced parsing model, and selects the highest scoring dependency tree as a correct analysis of the input sentence. */ | The PDB-based models are trained on the current version of [[http://zil.ipipan.waw.pl/PDB|Polish Dependency Bank]] with the publicly available parsing systems – [[https://gitlab.clarin-pl.eu/syntactic-tools/combo/-/tree/master|COMBO-pytorch]], [[https://github.com/360er0/COMBO|COMBO]], [[https://code.google.com/archive/p/mate-tools/|MateParser]] and [[http://maltparser.org|MaltParser]]. /* ''MaltParser'' is a transition-based dependency parser that uses a deterministic parsing algorithm. The deterministic parsing algorithm builds a dependency structure of an input sentence based on transitions (shift-reduce actions) predicted by a classifier. The classifier learns to predict the next transition given training data and the parse history. `MateParser`, in turn, is a graph-based parser that defines a space of well-formed candidate dependency trees for an input sentence, scores them given an induced parsing model, and selects the highest scoring dependency tree as a correct analysis of the input sentence. */ |
Line 6: | Line 6: |
* [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/200118_COMBO_PDB_nosem_parseonly.pkl|COMBO model]] for dependency parsing only * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/191107_COMBO_PDB_semlab_parseonly.pkl|COMBO model]] for (semantic) dependency parsing only * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/200118_COMBO_PDB_nosem_full.pkl|COMBO model]] for part-of-speech tagging, lemmatisation, and dependency parsing * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/200118_COMBO_PDB_sem_full.pkl|COMBO model]] for part-of-speech tagging, lemmatisation, dependency parsing, and semantic role labelling |
* [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO_pytorch/model.tar.gz|COMBO-pytorch model]] for dependency parsing only (with [[https://huggingface.co/allegro/herbert-base-cased|HerBERT-base]] embeddings), * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/20200930_COMBO_PDB_nosem_parseonly.pkl|COMBO model]] for dependency parsing only * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/20200930_COMBO_PDB_nosem.pkl|COMBO model]] for part-of-speech tagging, lemmatisation, and dependency parsing * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/20200930_COMBO_PDB_sem.pkl|COMBO model]] for part-of-speech tagging, lemmatisation, dependency parsing, and semantic role labelling {{{#!wiki comment * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/191107_COMBO_PDB_semlab_parseonly.pkl|COMBO model]] for (semantic) dependency parsing only}}} |
Line 16: | Line 18: |
The PDB-UD-based models are trained on the current version of [[http://git.nlp.ipipan.waw.pl/alina/PDBUD|Polish Dependency Bank in Universal Dependencies format]] with the publicly available parsing systems – [[http://ufal.mff.cuni.cz/udpipe|UDPipe]] and [[https://github.com/360er0/COMBO|COMBO]]. | The PDB-UD-based models are trained on the current version of [[http://git.nlp.ipipan.waw.pl/alina/PDBUD|Polish Dependency Bank in Universal Dependencies format]] with the publicly available parsing systems – [[https://gitlab.clarin-pl.eu/syntactic-tools/combo/-/tree/master|COMBO-pytorch]], [[https://github.com/360er0/COMBO|COMBO]], [[http://ufal.mff.cuni.cz/udpipe|UDPipe]]. |
Line 18: | Line 20: |
* [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/190423_COMBO_PDBUD_nosem.pkl|COMBO model]] for part-of-speech tagging, lemmatisation, and dependency parsing * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/190423_COMBO_PDBUD_sem.pkl|COMBO model]] for part-of-speech tagging, lemmatisation, dependency parsing, and semantic role labelling * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/UDPIPE/190423_PDBUD_ttp_embedd.udpipe|UDPipe model]] for tokenisation, part-of-speech tagging, lemmatisation, and dependency parsing * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/UDPIPE/190423_PDBUD_tokeniser.udpipe|UDPipe model]] for tokenisation |
* [[http://mozart.ipipan.waw.pl/~mklimaszewski/models/polish-herbert-base.tar.gz|COMBO-pytorch model]] for for part-of-speech tagging, lemmatisation, and dependency parsing (with [[https://huggingface.co/allegro/herbert-base-cased|HerBERT-base]] embeddings), * [[http://mozart.ipipan.waw.pl/~mklimaszewski/models/polish-herbert-large.tar.gz|COMBO-pytorch model]] for for part-of-speech tagging, lemmatisation, and dependency parsing (with [[https://huggingface.co/allegro/herbert-large-cased|HerBERT-large]] embeddings), * [[http://mozart.ipipan.waw.pl/~mklimaszewski/models/polish-ud27.tar.gz|COMBO-pytorch model]] for for part-of-speech tagging, lemmatisation, and dependency parsing (with fastText embeddings), * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/20200930_COMBO_PDBUD_nosem_full.pkl|COMBO model]] for part-of-speech tagging, lemmatisation, and dependency parsing * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/COMBO/20200930_COMBO_PDBUD_sem_full.pkl|COMBO model]] for part-of-speech tagging, lemmatisation, dependency parsing, and semantic role labelling * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/UDPIPE/20200930_PDBUD_ttp_embedd.udpipe|UDPipe model]] for tokenisation, part-of-speech tagging, lemmatisation, and dependency parsing * [[http://mozart.ipipan.waw.pl/~alina/Polish_dependency_parsing_models/UDPIPE/20200930_PDBUD_tokeniser.udpipe|UDPipe model]] for tokenisation |
Line 29: | Line 34: |
See [[http://clip.ipipan.waw.pl/benchmarks|Dependency parsing]] section. | See [[http://clip.ipipan.waw.pl/benchmarks#Dependency_parsing|Dependency parsing]] section. |
Line 90: | Line 95: |
== PDB-based MaltParser in Multiservice == * The performance of !MaltParser model for Polish may be tested in Multiservice NLP – [[http://multiservice.nlp.ipipan.waw.pl]]. * To parse a Polish text in Multiservice "Select predefined chain of actions": 5: Concraft, !DependencyParser, input your text, and press the button "Run". * To download the parser's output in CoNLL format, "Select output format:": |
== PDB-based dependency parsing demos == * [[http://scwad-demo.nlp.ipipan.waw.pl:8000/dependency-parsing|COMBO demo]] (only in Polish) * [[http://multiservice.nlp.ipipan.waw.pl|MaltParser demo in Multiservice NLP]] * To parse a Polish text in Multiservice "Select predefined chain of actions": 5: Concraft, !DependencyParser, input your text, and press the button "Run". * To download the parser's output in CoNLL format, "Select output format:". |
Line 108: | Line 115: |
== Founding == | == Acknowledgment == |
PDB-trained dependency parsing models for Polish
The PDB-based models are trained on the current version of Polish Dependency Bank with the publicly available parsing systems – COMBO-pytorch, COMBO, MateParser and MaltParser.
COMBO-pytorch model for dependency parsing only (with HerBERT-base embeddings),
COMBO model for dependency parsing only
COMBO model for part-of-speech tagging, lemmatisation, and dependency parsing
COMBO model for part-of-speech tagging, lemmatisation, dependency parsing, and semantic role labelling
MATE model for dependency parsing
MaltParser model for dependency parsing
PDB-UD-trained dependency parsing models for Polish
The PDB-UD-based models are trained on the current version of Polish Dependency Bank in Universal Dependencies format with the publicly available parsing systems – COMBO-pytorch, COMBO, UDPipe.
COMBO-pytorch model for for part-of-speech tagging, lemmatisation, and dependency parsing (with HerBERT-base embeddings),
COMBO-pytorch model for for part-of-speech tagging, lemmatisation, and dependency parsing (with HerBERT-large embeddings),
COMBO-pytorch model for for part-of-speech tagging, lemmatisation, and dependency parsing (with fastText embeddings),
COMBO model for part-of-speech tagging, lemmatisation, and dependency parsing
COMBO model for part-of-speech tagging, lemmatisation, dependency parsing, and semantic role labelling
UDPipe model for tokenisation, part-of-speech tagging, lemmatisation, and dependency parsing
UDPipe model for tokenisation
Parsing performance
See Dependency parsing section.
PDB-based dependency parsing demos
COMBO demo (only in Polish)
MaltParser demo in Multiservice NLP
To parse a Polish text in Multiservice "Select predefined chain of actions": 5: Concraft, DependencyParser, input your text, and press the button "Run".
- To download the parser's output in CoNLL format, "Select output format:".
Publications
Licensing
The dependency parsing models for Polish are released under the CC BY-NC-SA 4.0 licence and by downloading them you accept the conditions of that licence.
Acknowledgment
The research was founded by SONATA 8 grant no 2014/15/D/HS2/03486 from the National Science Centre Poland and by the Polish Ministry of Science and Higher Education as part of the investment in the CLARIN-PL research infrastructure. The computing was performed at Poznań Supercomputing and Networking Center.
Contact
Any questions, comments? Please send them to <alina AT SPAMFREE ipipan DOT waw DOT pl>.