This is the home page of the Polish Sentiment Treebank

The dataset is a dependency treebank with sentiment annotations. It was parsed using the Polish dependency parser models available from http://zil.ipipan.waw.pl/PolishDependencyParser.

For each sentence in the treebank, sentiment of each sub-phrase (sub-tree) has been assigned by a linguist. Sentiment of each leaf word has been labelled according to Polish sentiment dictionary, partially also verified manually.

Sentiment labels of both phrases and leaves include three classes: neutral, positive and negative.

Sentiment annotations for each token corresponds to the overall sentiment of the whole phrase under it and inclusive. Specifically:

  • for every leaf token or word, its sentiment corresponds to this word or token's sentiment
  • for every non-leaf token or word (node that has non-empty set of children) sentiment field describes the sentiment of the whole phrase, formed by sub-tree starting at this token (that includes this token and all tokens below it)

The treebank has been created specifically for the purpose of analysing compositional sentiment effects in Polish language.

Treebank Wydzwieku: version 1.0

  • The first part of the treebank is composed from the sub-part of Skladnica treebank, namely from sentences that contain at least one sentiment-bearing word. This part consists of 235 sentences (1915 sentiment-annotated multiword phrases).
  • The second part consits of 965 sentences from a product review corpus available from http://zil.ipipan.waw.pl/OPTA/. The number of sentiment-annotated multiword phrases is 4640.

Together, the first version of the treebank consisted of 6555 sentiment-annotated phrases from the parse trees of 1200 sentences. The resource described is the first freely available corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in the Polish language.

Treebank Wydzwieku: version 2.0

In August 2018, as we have added many sentences, we have released a 2.0 version of the treebank! It contains following new parts:

  • test sentences from PolEval 2017 sentiment task

  • 2 x 500 sentences collected from various sources on the web, mostly difficult, mixed sentiments and negative



Have questions or ideas?

Please contact me: axw at ipipan dot ...