Locked History Actions

Diff for "PDB"

Differences between revisions 45 and 46
Revision 45 as of 2018-10-05 15:07:35
Size: 3167
Comment:
Revision 46 as of 2019-01-18 10:03:41
Size: 3173
Comment:
Deletions are marked like this. Additions are marked like this.
Line 20: Line 20:
== PDB in Universal Dependencies format (PDBUD) == == PDB in Universal Dependencies format (PDB-UD) ==
Line 22: Line 22:
PDBUD is a conversion of PDB in the UD-like format. It is an extended and corrected version of the Polish UD treebank (the release 2.1). PDBUD contains enhanced graphs, i.e. trees with enhanced edges encoding shared dependents of coordinated elements, e.g. ''Dziewczynka śpiewa i tańczy'' (The girl sings and dances), and shared governors of coordinated elements, e.g. ''Dziewczynka i chłopiec śpiewają'' (A girl and a boy sing).
PDBUD trees were used in two shared tasks: [[http://multiword.sourceforge.net/PHITE.php?sitesig=CONF&page=CONF_04_LAW-MWE-CxG_2018___lb__COLING__rb__&subpage=CONF_40_Shared_Task|LAW-MWE-CxG-2018]] and [[http://poleval.pl|PolEval 2018]].
PDB-UD is a conversion of PDB in the UD-like format. It is an extended and corrected version of the Polish UD treebank (the release 2.1). PDB-UD contains enhanced graphs, i.e. trees with enhanced edges encoding shared dependents of coordinated elements, e.g. ''Dziewczynka śpiewa i tańczy'' (The girl sings and dances), and shared governors of coordinated elements, e.g. ''Dziewczynka i chłopiec śpiewają'' (A girl and a boy sing).
PDB-UD trees were used in two shared tasks: [[http://multiword.sourceforge.net/PHITE.php?sitesig=CONF&page=CONF_04_LAW-MWE-CxG_2018___lb__COLING__rb__&subpage=CONF_40_Shared_Task|LAW-MWE-CxG-2018]] and [[http://poleval.pl|PolEval 2018]].
Line 25: Line 25:
'''Download:''' PDBUD is publicly available on [[http://git.nlp.ipipan.waw.pl/alina/PDBUD]] '''Download:''' PDB-UD is publicly available on [[http://git.nlp.ipipan.waw.pl/alina/PDBUD]]
Line 28: Line 28:
Some dependency parsing models estimated on PDB and PDBUD are available at [[http://zil.ipipan.waw.pl/PDB/PDBparser]] Some dependency parsing models estimated on PDB and PDB-UD are available at [[http://zil.ipipan.waw.pl/PDB/PDBparser]]

Polish Dependency Bank 2.0 (PDB 2.0)

PDB 2.0 is an extended version of Składnica zależnościowa (the first Polish dependency treebank). It consists of 22,208 trees and 351,175 tokens (i.e. 15.8 tokens per sentence on average). There are four parts of PDB 2.0:

  1. NKJP1M-based trees (14K)
  2. Projection-based trees (4K)
  3. CDScorpus-based trees (2K)
  4. OTHER trees (2K)

The PDB sentences contain some problematic linguistic phenomena, e.g. ellipsis, comparative constructions, constructions with the bi-functional subordinating conjunction JAKO, directed speech, interpolations and comments, nominative noun phrases used in the vocative function and many others. Descritptions of Polish dependency relation types are at http://zil.ipipan.waw.pl/PDB/DepRelTypes (outdated). Some dependents are annotated with semantic roles, e.g. Beneficiary/Recipient.

Download: The updated version of Składnica zależnościowa (the first version of PDB). If you wish to get the entire PDB corpus (22K sentences annotated with the dependency trees) please contact alina <at> ipipan.waw.pl (replace <at> with @).

PDB in Universal Dependencies format (PDB-UD)

PDB-UD is a conversion of PDB in the UD-like format. It is an extended and corrected version of the Polish UD treebank (the release 2.1). PDB-UD contains enhanced graphs, i.e. trees with enhanced edges encoding shared dependents of coordinated elements, e.g. Dziewczynka śpiewa i tańczy (The girl sings and dances), and shared governors of coordinated elements, e.g. Dziewczynka i chłopiec śpiewają (A girl and a boy sing). PDB-UD trees were used in two shared tasks: LAW-MWE-CxG-2018 and PolEval 2018.

Download: PDB-UD is publicly available on http://git.nlp.ipipan.waw.pl/alina/PDBUD

PDB-based parsers

Some dependency parsing models estimated on PDB and PDB-UD are available at http://zil.ipipan.waw.pl/PDB/PDBparser

Publications

List of publications

Alina Wróblewska. Polish Dependency Parser Trained on an Automatically Induced Dependency Bank. Ph.D. dissertation, Institute of Computer Science, Polish Academy of Sciences, Warsaw, 2014.

Alina Wróblewska. Extended and Enhanced Polish Dependency Bank in Universal Dependencies Format In: Proceedings of Universal Dependencies Workshop 2018 (UDW 2018), 2018

Acknowledgements

The creation of PDB was supported by grant no POIG.01.01.02-14-013/09 from Innovative Economy Operational Programme co-financed by the European Union (European Regional Development Fund) and by the grant from the Polish Ministry of Science and Higher Education as part of the investment in the CLARIN-PL research infrastructure (2016-2018).

Licence

The resources are distributed under the CC BY-NC-SA 4.0 licence.

Contact

Any questions, comments? Please send them to Alina Wróblewska <alina AT SPAMFREE ipipan DOT waw DOT pl>.