Revision 5 as of 2017-03-11 16:45:12

Clear message
Locked History Actions

TreebankWydzwieku

This is the home page of the Polish Sentiment Treebank

The dataset can be downloaded here.

The dataset is a dependency treebank with sentiment annotations. It was parsed using the Polish dependency parser models available from http://zil.ipipan.waw.pl/PolishDependencyParser.

For each sentence in the treebank, sentiment of each sub-phrase (sub-tree) has been assigned by a linguist. Sentiment of each leaf word has been labelled according to Polish sentiment dictionary, an extension of http://zil.ipipan.waw.pl/SlownikWydzwieku/ and also verified manually.

Sentiment labels of both phrases and leaves include three classes: neutral, positive and negative.

The treebank has been created specifically for the purpose of analysing compositional sentiment effects in Polish language.

It has been built from two parts. The first part of the treebank is composed from the sub-part of Skladnica treebank, namely from sentences that contain at least one sentiment-bearing word. This part consists of 235 sentences (1915 sentiment-annotated multiword phrases). The second part consits of 965 sentences from a product review corpus available from http://zil.ipipan.waw.pl/OPTA/. The number of sentiment-annotated multiword phrases is 4640.

Together, the current version of the treebank consists of 6555 sentiment-annotated phrases from the parse trees of 1200 sentences. The resource described is the first freely available corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in the Polish language.