Locked History Actions

Diff for "TreebankWydzwieku"

Differences between revisions 2 and 16 (spanning 14 versions)
Revision 2 as of 2017-03-11 16:12:07
Size: 1673
Comment:
Revision 16 as of 2018-08-11 18:35:43
Size: 2719
Comment:
Deletions are marked like this. Additions are marked like this.
Line 2: Line 2:
= This is the home page of the Polish Sentiment Treebank (Treebank Wydzwieku ;). =

The list can be downloaded [[attachment:TreebankWydzwieku01.zip|here]].
= This is the home page of the Polish Sentiment Treebank =
Line 8: Line 6:
For each sentence in the treebank, sentiment of each sub-phrase (sub-tree) has been assigned by a linguist. Sentiment of each leaf word has been labelled according to Polish sentiment dictionary, an extension of http://zil.ipipan.waw.pl/SlownikWydzwieku/ and also verified manually. For each sentence in the treebank, sentiment of each sub-phrase (sub-tree) has been assigned by a linguist. Sentiment of each leaf word has been labelled according to Polish sentiment dictionary, partially also verified manually.
Line 12: Line 10:
Sentiment annotations for each token corresponds to the overall sentiment of the whole phrase under it and inclusive. Specifically:
 * for every leaf token or word, its sentiment corresponds to this word or token's sentiment
* for every non-leaf token or word (node that has non-empty set of children) sentiment field describes the sentiment of the whole phrase, formed by sub-tree starting at this token (that includes this token and all tokens below it)
Line 14: Line 16:
It has been built from two parts. The first part of the treebank is composed from the sub-part of Skladnica treebank, namely from sentences that contain at least one sentiment-bearing word. This part consists of 235 sentences (1915 sentiment-annotated multiword phrases). The second part consits of 965 sentences from a product review corpus available from \url{http://zil.ipipan.waw.pl/OPTA/}. The number of sentiment-annotated multiword phrases is 4640. ==== Treebank Wydzwieku: version 1.0 (initial) ====
Line 16: Line 18:
Together, the current version of the treebank consists of 6555 sentiment-annotated phrases from the parse trees of 1200 sentences. The resource described is the first freely available corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in the Polish language.  * The first part of the treebank is composed from the sub-part of Skladnica treebank, namely from sentences that contain at least one sentiment-bearing word. This part consists of 235 sentences (1915 sentiment-annotated multiword phrases).
 * The second part consits of 965 sentences from a product review corpus available from http://zil.ipipan.waw.pl/OPTA/. The number of sentiment-annotated multiword phrases is 4640.

Together, the first version of the treebank consisted of 6555 sentiment-annotated phrases from the parse trees of 1200 sentences. The resource described is the first freely available corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in the Polish language.

==== Treebank Wydzwieku: version 2.0 (aug 2018 ) ====

As we have added many sentences, we have released a 2.0 version of the treebank! It contains following new parts:
 * test sentences from PolEval 2017 sentiment task
 * 2 x 500 sentences collected from various sources on the web, mostly difficult, mixed sentiments and negative

==== Download ====

Releases:
 * [[http://zil.ipipan.waw.pl/TreebankWydzwieku?action=AttachFile&do=get&target=TreebankWydzwieku01.zip| version 1.0 ]]
 * [[http://zil.ipipan.waw.pl/TreebankWydzwieku?action=AttachFile&do=get&target=TreebankWydzwieku2.0.tar.gz| version 2.0 [new!] ]]


==== Have questions or ideas? ====

Please contact me: axw at ipipan dot ...

This is the home page of the Polish Sentiment Treebank

The dataset is a dependency treebank with sentiment annotations. It was parsed using the Polish dependency parser models available from http://zil.ipipan.waw.pl/PolishDependencyParser.

For each sentence in the treebank, sentiment of each sub-phrase (sub-tree) has been assigned by a linguist. Sentiment of each leaf word has been labelled according to Polish sentiment dictionary, partially also verified manually.

Sentiment labels of both phrases and leaves include three classes: neutral, positive and negative.

Sentiment annotations for each token corresponds to the overall sentiment of the whole phrase under it and inclusive. Specifically:

  • for every leaf token or word, its sentiment corresponds to this word or token's sentiment

* for every non-leaf token or word (node that has non-empty set of children) sentiment field describes the sentiment of the whole phrase, formed by sub-tree starting at this token (that includes this token and all tokens below it)

The treebank has been created specifically for the purpose of analysing compositional sentiment effects in Polish language.

Treebank Wydzwieku: version 1.0 (initial)

  • The first part of the treebank is composed from the sub-part of Skladnica treebank, namely from sentences that contain at least one sentiment-bearing word. This part consists of 235 sentences (1915 sentiment-annotated multiword phrases).
  • The second part consits of 965 sentences from a product review corpus available from http://zil.ipipan.waw.pl/OPTA/. The number of sentiment-annotated multiword phrases is 4640.

Together, the first version of the treebank consisted of 6555 sentiment-annotated phrases from the parse trees of 1200 sentences. The resource described is the first freely available corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in the Polish language.

Treebank Wydzwieku: version 2.0 (aug 2018 )

As we have added many sentences, we have released a 2.0 version of the treebank! It contains following new parts:

  • test sentences from PolEval 2017 sentiment task

  • 2 x 500 sentences collected from various sources on the web, mostly difficult, mixed sentiments and negative

Download

Releases:

Have questions or ideas?

Please contact me: axw at ipipan dot ...