| Size: 6663 Comment:  | Size: 8288 Comment:  | 
| Deletions are marked like this. | Additions are marked like this. | 
| Line 2: | Line 2: | 
| = PDBparser = | == PDB-based dependency parsing models for Polish == | 
| Line 4: | Line 4: | 
| PDBparser is a Polish dependency parser trained on the current version of ([[http://zil.ipipan.waw.pl/PDB|Polish Depedency Bank]]) with the publicly available parsing systems – [[http://maltparser.org|MaltParser]] or [[https://code.google.com/archive/p/mate-tools/|MateParser]]. `MaltParser` is a transition-based dependency parser that uses a deterministic parsing algorithm. The deterministic parsing algorithm builds a dependency structure of an input sentence based on transitions (shift-reduce actions) predicted by a classifier. The classifier learns to predict the next transition given training data and the parse history. `MateParser`, in turn, is a graph-based parser that defines a space of well-formed candidate dependency trees for an input sentence, scores them given an induced parsing model, and selects the highest scoring dependency tree as a correct analysis of the input sentence. | The PDB-based models are trained on the current version of [[http://zil.ipipan.waw.pl/PDB|Polish Depedency Bank]] with the publicly available parsing systems – [[https://github.com/360er0/COMBO|COMBO]], [[https://code.google.com/archive/p/mate-tools/|MateParser]] and [[http://maltparser.org|MaltParser]]. /* ''MaltParser'' is a transition-based dependency parser that uses a deterministic parsing algorithm. The deterministic parsing algorithm builds a dependency structure of an input sentence based on transitions (shift-reduce actions) predicted by a classifier. The classifier learns to predict the next transition given training data and the parse history. `MateParser`, in turn, is a graph-based parser that defines a space of well-formed candidate dependency trees for an input sentence, scores them given an induced parsing model, and selects the highest scoring dependency tree as a correct analysis of the input sentence. */ | 
| Line 6: | Line 6: | 
| == PDB-based parsing models for Polish == | * COMBO * PDB-based COMBO model for part-of-speech tagging, lemmatisation, and dependency parsing: [[attachment:190115_COMBO_PDB_nosem.pkl]] * PDB-based COMBO model for part-of-speech tagging, lemmatisation, dependency parsing and semantic role labelling: [[attachment: 190115_COMBO_PDB_sem.pkl]] | 
| Line 8: | Line 10: | 
| === MateParser === | {{{#!wiki comment * '''NEW!''' PDB-based COMBO model compatible with the tagset of Morfeusz 2: [[attachment:180912_PDBCOMBO.pkl]] | 
| Line 10: | Line 13: | 
| * PDBMate model compatible with Morfeusz: [[attachment:170608_PDBMate.mdl]] * PDBMate model compatible with Morfeusz 2: | * MateParser | 
| Line 13: | Line 15: | 
| === MaltParser === * PDBMalt model compatible with Morfeusz: [[attachment:170608_PDBMalt.mco]] * PDBMalt model compatible with Morfeusz 2: | * '''NEW!''' PDB-based Mate model compatible with the tagset of Morfeusz 2: [[attachment:180322_PDBMate.mdl]] * PDB-based Mate model compatible with the tagset of Morfeusz: [[attachment:170608_PDBMate.mdl]]}}} | 
| Line 17: | Line 18: | 
| === Semantic PDB models === * Semantic PDBMate model: * Semantic PDBMalt model: | * MaltParser * PDB-based MaltParser model: [[attachment:]] {{{#!wiki comment * '''NEW!''' PDB-based MaltParser model compatible with the tagset of Morfeusz 2: [[attachment:180322_PDBMalt.mco]] * PDB-basd MaltParser model compatible with the tagset of Morfeusz: [[attachment:170608_PDBMalt.mco]]}}} | 
| Line 22: | Line 26: | 
| The dependency parsing models for Polish are released under the GNU General Public License v3 (GPL v.3) and by downloading it you accept the conditions of that licence. | == PDBUD-based dependency parsing models for Polish == The PDBUD-based models are trained on the current version of [[http://git.nlp.ipipan.waw.pl/alina/PDBUD|Polish Depedency Bank in Universal Dependencies format]] with the publicly available parsing systems – [[http://ufal.mff.cuni.cz/udpipe|UDPipe]] and [[https://github.com/360er0/COMBO|COMBO]]. * [[http://mozart.ipipan.waw.pl/~prybak/model_poleval2018/model_A_semi.pkl|COMBO]] model for Polish (the model estimated for the [[http://poleval.pl/tasks#task1|PolEval 2018]] competition) * [[attachment:180606_PDBUDPipe.udpipe|UDPipe]] model for Polish | 
| Line 25: | Line 33: | 
| See [[http://clip.ipipan.waw.pl/benchmarks|Dependency parsing]] section. {{{#!wiki comment | |
| Line 28: | Line 40: | 
| || `Polish MateParser` || 0.85 || 0.89 || || `Polish MaltParser` || 0.82 || 0.86 || | || `PDBMate` || 0.85 || 0.89 || || `PDBMalt` || 0.82 || 0.86 || | 
| Line 62: | Line 74: | 
| }}} | |
| Line 63: | Line 76: | 
| == Dependency parser integrated into Multiservice NLP for Polish == | == PDB-based MaltParser in Multiservice == | 
| Line 69: | Line 82: | 
| <<BibMate(key, "wrob:14", omitYears=true)>> | <<BibMate(key, "wro:14", omitYears=true)>> | 
| Line 72: | Line 85: | 
| <<BibMate(key, "awmw:deparsing", omitYears=true)>> | <<BibMate(key, "awmw:departing", omitYears=true)>> == Licensing == The dependency parsing models for Polish are released under the [[https://creativecommons.org/licenses/by-nc-sa/4.0/|CC BY-NC-SA 4.0]] licence and by downloading it you accept the conditions of that licence. == Founding == The research was founded by SONATA 8 grant no 2014/15/D/HS2/03486 from the National Science Centre Poland and by the Polish Ministry of Science and Higher Education as part of the investment in the CLARIN-PL research infrastructure. | 
PDB-based dependency parsing models for Polish
The PDB-based models are trained on the current version of Polish Depedency Bank with the publicly available parsing systems – COMBO, MateParser and MaltParser.
- COMBO - PDB-based COMBO model for part-of-speech tagging, lemmatisation, and dependency parsing: 190115_COMBO_PDB_nosem.pkl 
- PDB-based COMBO model for part-of-speech tagging, lemmatisation, dependency parsing and semantic role labelling: 190115_COMBO_PDB_sem.pkl 
 
- PDB-based MaltParser model: 
 
PDBUD-based dependency parsing models for Polish
The PDBUD-based models are trained on the current version of Polish Depedency Bank in Universal Dependencies format with the publicly available parsing systems – UDPipe and COMBO.
- COMBO model for Polish (the model estimated for the PolEval 2018 competition) 
- UDPipe model for Polish 
Parsing performance
See Dependency parsing section.
PDB-based MaltParser in Multiservice
- The performance of MaltParser model for Polish may be tested in Multiservice NLP – http://multiservice.nlp.ipipan.waw.pl. 
- To parse a Polish text in Multiservice "Select predefined chain of actions": 5: Concraft, DependencyParser, input your text, and press the button "Run". 
- To download the parser's output in CoNLL format, "Select output format:":
Publications
Licensing
The dependency parsing models for Polish are released under the CC BY-NC-SA 4.0 licence and by downloading it you accept the conditions of that licence.
Founding
The research was founded by SONATA 8 grant no 2014/15/D/HS2/03486 from the National Science Centre Poland and by the Polish Ministry of Science and Higher Education as part of the investment in the CLARIN-PL research infrastructure.
Contact
Any questions, comments? Please send them to <alina AT SPAMFREE ipipan DOT waw DOT pl>.

 
 
                            

