| Size: 6739 Comment:  | Size: 6927 Comment:  | 
| Deletions are marked like this. | Additions are marked like this. | 
| Line 2: | Line 2: | 
| = PDBparser = | = PDB-based dependency parsing models for Polish = | 
| Line 4: | Line 4: | 
| PDBparser is a Polish dependency parser trained on the current version of ([[http://zil.ipipan.waw.pl/PDB|Polish Depedency Bank]]) with the publicly available parsing systems – [[http://maltparser.org|MaltParser]] or [[https://code.google.com/archive/p/mate-tools/|MateParser]]. `MaltParser` is a transition-based dependency parser that uses a deterministic parsing algorithm. The deterministic parsing algorithm builds a dependency structure of an input sentence based on transitions (shift-reduce actions) predicted by a classifier. The classifier learns to predict the next transition given training data and the parse history. `MateParser`, in turn, is a graph-based parser that defines a space of well-formed candidate dependency trees for an input sentence, scores them given an induced parsing model, and selects the highest scoring dependency tree as a correct analysis of the input sentence. | The PDB-based models are trained on the current version of ([[http://zil.ipipan.waw.pl/PDB|Polish Depedency Bank]]) with the publicly available parsing systems – [[|COMBO]], [[http://maltparser.org|MaltParser]] or [[https://code.google.com/archive/p/mate-tools/|MateParser]]. `MaltParser` is a transition-based dependency parser that uses a deterministic parsing algorithm. The deterministic parsing algorithm builds a dependency structure of an input sentence based on transitions (shift-reduce actions) predicted by a classifier. The classifier learns to predict the next transition given training data and the parse history. `MateParser`, in turn, is a graph-based parser that defines a space of well-formed candidate dependency trees for an input sentence, scores them given an induced parsing model, and selects the highest scoring dependency tree as a correct analysis of the input sentence. | 
| Line 6: | Line 6: | 
| == PDB-based parsing models for Polish == | == PDB-based models == | 
| Line 10: | Line 10: | 
| * '''NEW!''' PDBMate model (compatible with the tagset of Morfeusz 2): [[attachment:180322_PDBMate.mdl]] | |
| Line 11: | Line 12: | 
| * PDBMate model (compatible with the tagset of Morfeusz 2): | |
| Line 14: | Line 14: | 
| * '''NEW!''' PDBMalt model (compatible with the tagset of Morfeusz 2): [[attachment:180322_PDBMalt.mco]] | |
| Line 15: | Line 17: | 
| * PDBMalt model (compatible with the tagset of Morfeusz 2): [[attachment:180322_PDBMalt.mco]] | |
| Line 17: | Line 18: | 
| === Semantic PDB models === * Semantic PDBMate model: * Semantic PDBMalt model: | = PDBUD-based dependency parsing models for Polish= | 
| Line 21: | Line 20: | 
| The dependency parsing models for Polish are released under the GNU General Public License v3 (GPL v.3) and by downloading it you accept the conditions of that licence. | === UDPipe === * UDPipe model for Polish: [[attachment:180606_PDBUDPipe.udpipe]] | 
| Line 25: | Line 24: | 
| Line 69: | Line 69: | 
| <<BibMate(key, "wrob:14", omitYears=true)>> | <<BibMate(key, "wro:14", omitYears=true)>> | 
| Line 72: | Line 72: | 
| <<BibMate(key, "awmw:deparsing", omitYears=true)>> | <<BibMate(key, "awmw:departing", omitYears=true)>> | 
| Line 75: | Line 75: | 
| == Contact == | === Licensing === The dependency parsing models for Polish are released under the [[https://creativecommons.org/licenses/by-nc-sa/4.0/|CC BY-NC-SA 4.0]] licence and by downloading it you accept the conditions of that licence. === Contact === | 
PDB-based dependency parsing models for Polish
The PDB-based models are trained on the current version of (Polish Depedency Bank) with the publicly available parsing systems – [[|COMBO]], MaltParser or MateParser. MaltParser is a transition-based dependency parser that uses a deterministic parsing algorithm. The deterministic parsing algorithm builds a dependency structure of an input sentence based on transitions (shift-reduce actions) predicted by a classifier. The classifier learns to predict the next transition given training data and the parse history. MateParser, in turn, is a graph-based parser that defines a space of well-formed candidate dependency trees for an input sentence, scores them given an induced parsing model, and selects the highest scoring dependency tree as a correct analysis of the input sentence.
PDB-based models
MateParser
- NEW! PDBMate model (compatible with the tagset of Morfeusz 2): 180322_PDBMate.mdl 
- PDBMate model (compatible with the tagset of Morfeusz): 170608_PDBMate.mdl 
MaltParser
- NEW! PDBMalt model (compatible with the tagset of Morfeusz 2): 180322_PDBMalt.mco 
- PDBMalt model (compatible with the tagset of Morfeusz): 170608_PDBMalt.mco 
= PDBUD-based dependency parsing models for Polish=
UDPipe
- UDPipe model for Polish: 180606_PDBUDPipe.udpipe 
Parsing performance
10-fold cross-validation (avg.)
| Model | LAS | UAS | 
| PDBMate | 0.85 | 0.89 | 
| PDBMalt | 0.82 | 0.86 | 
Precision, recall and f-score of individual dependency relations (avg.)
The description of Polish dependency relations types is available on Polish dependency relation types.
| Dependency relation type | Precision | Recall | F-Measure | |||
| Mate | Malt | Mate | Malt | Mate | Malt | |
| abbrev_punct | 0.99 | 0.99 | 0.98 | 0.97 | 0.98 | 0.98 | 
| adjunct | 0.89 | 0.73 | 0.92 | 0.77 | 0.82 | 0.75 | 
| adjunct_qt | 0.74 | 0.51 | 0.76 | 0.58 | 0.75 | 0.55 | 
| aglt | 1.00 | 0.98 | 1.00 | 0.98 | 0.98 | 0.98 | 
| app | 0.75 | 0.58 | 0.69 | 0.52 | 0.72 | 0.55 | 
| aux | 0.95 | 0.90 | 0.97 | 0.92 | 0.96 | 0.91 | 
| comp | 0.90 | 0.85 | 0.87 | 0.82 | 0.88 | 0.84 | 
| comp_ag | 0.95 | 0.90 | 0.96 | 0.91 | 0.94 | 0.90 | 
| comp_fin | 0.87 | 0.75 | 0.86 | 0.79 | 0.87 | 0.77 | 
| comp_inf | 0.95 | 0.91 | 0.96 | 0.90 | 0.93 | 0.90 | 
| cond | 1.00 | 0.97 | 1.00 | 0.96 | 1.00 | 0.96 | 
| conjunct | 0.85 | 0.71 | 0.82 | 0.65 | 0.82 | 0.68 | 
| imp | 0.98 | 0.97 | 0.91 | 0.87 | 0.94 | 0.92 | 
| item | 0.87 | 0.4 | 0.73 | 0.37 | 0.61 | 0.39 | 
| mwe | 0.90 | 0.83 | 0.83 | 0.75 | 0.87 | 0.79 | 
| ne | 0.87 | 0.78 | 0.73 | 0.64 | 0.76 | 0.70 | 
| neg | 0.99 | 0.97 | 1.00 | 0.98 | 0.99 | 0.98 | 
| obj | 0.89 | 0.81 | 0.91 | 0.86 | 0.89 | 0.83 | 
| obj_th | 0.83 | 0.76 | 0.76 | 0.65 | 0.80 | 0.70 | 
| pd | 0.86 | 0.77 | 0.80 | 0.72 | 0.87 | 0.74 | 
| pre_coord | 0.86 | 0.76 | 0.78 | 0.55 | 0.82 | 0.64 | 
| punct | 0.97 | 0.75 | 0.98 | 0.76 | 0.88 | 0.76 | 
| refl | 0.99 | 0.96 | 0.99 | 0.96 | 0.99 | 0.96 | 
| root | 0.91 | 0.80 | 0.91 | 0.81 | 0.94 | 0.80 | 
| subj | 0.94 | 0.84 | 0.94 | 0.83 | 0.94 | 0.84 | 
Dependency parser integrated into Multiservice NLP for Polish
- The performance of MaltParser model for Polish may be tested in Multiservice NLP – http://multiservice.nlp.ipipan.waw.pl. 
- To parse a Polish text in Multiservice "Select predefined chain of actions": 5: Concraft, DependencyParser, input your text, and press the button "Run". 
- To download the parser's output in CoNLL format, "Select output format:":
Publications
Licensing
The dependency parsing models for Polish are released under the CC BY-NC-SA 4.0 licence and by downloading it you accept the conditions of that licence.
Contact
Any questions, comments? Please send them to <alina AT SPAMFREE ipipan DOT waw DOT pl>.

 
 
                            

