Differences between revisions 1 and 8 (spanning 7 versions)

= Polish Dependency Bank (PDB)=

under development ...

# sentences	22,208
# tokens	351,715
# tokens per sentence	15.84

PDB relation types

Descritptions of Polish dependency relation types are at http://zil.ipipan.waw.pl/PDB/DepRelTypes (outdated).

PDB parser

Some dependency parsing models estimated on PDB are available at http://zil.ipipan.waw.pl/PDB/PDBparser

Polish Dependency Bank in Universal Dependencies format (PDBUD)

PDB is an extended version of Składnica" treebank. Since the UD conversion of "Składnica" trees constitutes the Polish treebank in Universal Dependencies collection (the release 2.1), PDBUD is thus an extended and corrected version of this treebank.

The converted PDBUD trees are largely consistent with Polish UD trees. Składnica trees are rather simple and the sentences underlying this data set do not contain some linguistic phenomena, e.g. ellipsis, comparative constructions, constructions with the bi-functional subordinating conjunction "jako", directed speech, interpolations and comments, nominative noun phrases used in the vocative function and many others. Therefore, the repertoire of UD relation subtypes and language-specific features is slightly extended to cover these phenomena. Furthermore, PDBUD trees contain enhanced edges encoding shared dependents of coordinated elements, e.g. "Dziewczynka śpiewa i tańczy. (The girl sings and dances.), and shared governors of coordinated elements, e.g. "Dziewczynka i chłopiec śpiewają. (A girl and a boy sing.)

PDBUD trees are used in:

Shared task on automatic identification of verbal multiword expressions (LAW-MWE-CxG-2018)

Shared task on dependency parsing of Polish (PolEval 2018, [[http://poleval.pl]])

Data

Note! As PDBUD data are used in PolEval 2018, test data are currently not publicly available.
Basic PDBUD trees

-  ⇤ ← Revision 1 as of 2017-05-24 12:22:13 → 
  Size: 24
  Editor: AlinaWroblewska
  Comment:
+   ← Revision 8 as of 2018-06-15 09:26:56 → ⇥
  Size: 1996
  Editor: AlinaWroblewska
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-Polish Dependency Bank
+#format wiki
#language en
#acl +All:read Default

= Polish Dependency Bank (PDB)=

under development ...

|| # sentences || 22,208 ||
|| # tokens || 351,715 ||
|| # tokens per sentence || 15.84 ||

== PDB relation types ==
Descritptions of Polish dependency relation types are at [[http://zil.ipipan.waw.pl/PDB/DepRelTypes]] (outdated).

== PDB parser ==
Some dependency parsing models estimated on PDB are available at [[http://zil.ipipan.waw.pl/PDB/PDBparser]]

= Polish Dependency Bank in Universal Dependencies format (PDBUD) =

PDB is an extended version of ''Składnica" treebank. Since the UD conversion of "Składnica" trees constitutes the Polish treebank in Universal Dependencies collection (the release 2.1), PDBUD is thus an extended and corrected version of this treebank. 

The converted PDBUD trees are largely consistent with Polish UD trees. Składnica trees are rather simple and the sentences underlying this data set do not contain some linguistic phenomena, e.g. ellipsis, comparative constructions, constructions with the bi-functional subordinating conjunction "jako", directed speech, interpolations and comments, nominative noun phrases used in the vocative function and many others. Therefore, the repertoire of UD relation subtypes and language-specific features is slightly extended to cover these phenomena. Furthermore, PDBUD trees contain enhanced edges encoding shared dependents of coordinated elements, e.g. "Dziewczynka śpiewa i tańczy. (The girl sings and dances.), and shared governors of coordinated elements, e.g. "Dziewczynka i chłopiec śpiewają. (A girl and a boy sing.)

PDBUD trees are used in:
=== Shared task on automatic identification of verbal multiword expressions (LAW-MWE-CxG-2018) ===
=== Shared task on dependency parsing of Polish (PolEval 2018, [[http://poleval.pl]]) ===

== Data ==
Note! As PDBUD data are used in PolEval 2018, test data are currently not publicly available.
=== Basic PDBUD trees ===

Diff for "PDB"

Menu

PDB relation types

PDB parser

Polish Dependency Bank in Universal Dependencies format (PDBUD)

Shared task on automatic identification of verbal multiword expressions (LAW-MWE-CxG-2018)

Shared task on dependency parsing of Polish (PolEval 2018, [[http://poleval.pl]])

Data

Basic PDBUD trees