Locked History Actions

Diff for "PDB"

Differences between revisions 1 and 29 (spanning 28 versions)
Revision 1 as of 2017-05-24 12:22:13
Size: 24
Comment:
Revision 29 as of 2018-10-05 14:18:38
Size: 2749
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
Polish Dependency Bank #format wiki
#language en
#acl +All:read Default

= Polish Dependency Bank 2.0 (PDB 2.0) =

PDB 2.0 is an extended version of ''Składnica'' treebank. It consists of 22,208 trees and 351,175 tokens (i.e. 15.8 tokens per sentence on average). There are four parts of PDB 2.0:
 1. NKJP1M trees (14K)
 2. PDB_projected trees (4K)
 3. CDScorpus-based trees (2K)
 4. OTHER trees (2K)

The PDB sentences contain some linguistic phenomena, e.g. ellipsis, comparative constructions, constructions with the bi-functional subordinating conjunction JAKO, directed speech, interpolations and comments, nominative noun phrases used in the vocative function and many others.

==== PDB data ====

The updated version of [[attachment:NKJP1M_Skladnica_sem.conll|Składnica zależnościowa]] (the previous version of PDB).

If you wish to get the entire PDB corpus (22K sentences annotated with the dependency trees) please contact ''alina'' <at> ''ipipan.waw.pl'' (replace <at> with @).

==== PDB relation types ====
Descritptions of Polish dependency relation types are at [[http://zil.ipipan.waw.pl/PDB/DepRelTypes]] (outdated).


== Polish Dependency Bank in Universal Dependencies format (PDBUD) ==

PDBUD is a conversion of PDB into the UD-like format. It is an extended and corrected version of the Polish UD treebank (the release 2.1). PDBUD contains enhanced graphs, i.e. trees with enhanced edges encoding shared dependents of coordinated elements, e.g. Dziewczynka śpiewa i tańczy. (The girl sings and dances.), and shared governors of coordinated elements, e.g. Dziewczynka i chłopiec śpiewają. (A girl and a boy sing.)
PDBUD trees were used in:
 * Shared task on automatic identification of verbal multiword expressions (LAW-MWE-CxG-2018)
 * Shared task on dependency parsing of Polish (PolEval 2018, [[http://poleval.pl]])

==== PDBUD data ====
PDBUD is publicly available on [[http://git.nlp.ipipan.waw.pl/alina/PDBUD]]

== PDB parser ==
Some dependency parsing models estimated on PDB are available at [[http://zil.ipipan.waw.pl/PDB/PDBparser]]

== Publications ==

== Acknowledgements ==
The creation of PDB was supported by grant no POIG.01.01.02-14-013/09 from Innovative Economy Operational Programme co-financed by the European Union (European Regional Development Fund) and by the grant from the Polish Ministry of Science and Higher Education as part of the investment in the CLARIN-PL research infrastructure (2016-2018).


== Licence ==

The resources is distributed under the [[https://creativecommons.org/licenses/by-nc-sa/4.0/|CC BY-NC-SA 4.0]] licence.

== Contact ==
Any questions, comments? Please send them to Alina Wróblewska <<MailTo(alina AT SPAMFREE ipipan DOT waw DOT pl)>>.

Polish Dependency Bank 2.0 (PDB 2.0)

PDB 2.0 is an extended version of Składnica treebank. It consists of 22,208 trees and 351,175 tokens (i.e. 15.8 tokens per sentence on average). There are four parts of PDB 2.0:

  1. NKJP1M trees (14K)
  2. PDB_projected trees (4K)
  3. CDScorpus-based trees (2K)
  4. OTHER trees (2K)

The PDB sentences contain some linguistic phenomena, e.g. ellipsis, comparative constructions, constructions with the bi-functional subordinating conjunction JAKO, directed speech, interpolations and comments, nominative noun phrases used in the vocative function and many others.

PDB data

The updated version of Składnica zależnościowa (the previous version of PDB).

If you wish to get the entire PDB corpus (22K sentences annotated with the dependency trees) please contact alina <at> ipipan.waw.pl (replace <at> with @).

PDB relation types

Descritptions of Polish dependency relation types are at http://zil.ipipan.waw.pl/PDB/DepRelTypes (outdated).

Polish Dependency Bank in Universal Dependencies format (PDBUD)

PDBUD is a conversion of PDB into the UD-like format. It is an extended and corrected version of the Polish UD treebank (the release 2.1). PDBUD contains enhanced graphs, i.e. trees with enhanced edges encoding shared dependents of coordinated elements, e.g. Dziewczynka śpiewa i tańczy. (The girl sings and dances.), and shared governors of coordinated elements, e.g. Dziewczynka i chłopiec śpiewają. (A girl and a boy sing.) PDBUD trees were used in:

  • Shared task on automatic identification of verbal multiword expressions (LAW-MWE-CxG-2018)
  • Shared task on dependency parsing of Polish (PolEval 2018, http://poleval.pl)

PDBUD data

PDBUD is publicly available on http://git.nlp.ipipan.waw.pl/alina/PDBUD

PDB parser

Some dependency parsing models estimated on PDB are available at http://zil.ipipan.waw.pl/PDB/PDBparser

Publications

Acknowledgements

The creation of PDB was supported by grant no POIG.01.01.02-14-013/09 from Innovative Economy Operational Programme co-financed by the European Union (European Regional Development Fund) and by the grant from the Polish Ministry of Science and Higher Education as part of the investment in the CLARIN-PL research infrastructure (2016-2018).

Licence

The resources is distributed under the CC BY-NC-SA 4.0 licence.

Contact

Any questions, comments? Please send them to Alina Wróblewska <alina AT SPAMFREE ipipan DOT waw DOT pl>.