Krzaki (bushes)

A corpus of Polish manually annotated for dependency structures. It consists of ~20000 sentences, the same set as used in Składnica, but annotated independently of Składnica by a team of about 10 linguists into unlabelled dependency structures.

Each of the sentences was first annotated by two or three annotators. In case of at least one discrepancy, a superannotator decided on the final tree, who also maintained the shared annotation manual and responded to inquires of all the linguists.

10617 sentences did not require superannotation, whereas 9395 did.

This treebank has only segment-head links determined, without specifying their functions. Contrary to Składnica (which contains only sentences which could be parsed by Świgra), this treebank was created manually, from a representative set of sentences from the manually disambiguated for morphosyntax subcorpus of NKJP.

The corpus is distributed in CONLL format.