Locked History Actions

Diff for "TreebankWydzwieku"

Differences between revisions 4 and 15 (spanning 11 versions)
Revision 4 as of 2017-03-11 16:13:16
Size: 1647
Comment:
Revision 15 as of 2018-08-11 18:27:35
Size: 2311
Comment:
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:

The dataset can be downloaded [[attachment:TreebankWydzwieku01.zip|here]].
Line 14: Line 12:
It has been built from two parts. The first part of the treebank is composed from the sub-part of Skladnica treebank, namely from sentences that contain at least one sentiment-bearing word. This part consists of 235 sentences (1915 sentiment-annotated multiword phrases). The second part consits of 965 sentences from a product review corpus available from http://zil.ipipan.waw.pl/OPTA/. The number of sentiment-annotated multiword phrases is 4640. ==== Treebank Wydzwieku: version 1.0 (initial) ====
Line 16: Line 14:
Together, the current version of the treebank consists of 6555 sentiment-annotated phrases from the parse trees of 1200 sentences. The resource described is the first freely available corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in the Polish language.  * The first part of the treebank is composed from the sub-part of Skladnica treebank, namely from sentences that contain at least one sentiment-bearing word. This part consists of 235 sentences (1915 sentiment-annotated multiword phrases).
 * The second part consits of 965 sentences from a product review corpus available from http://zil.ipipan.waw.pl/OPTA/. The number of sentiment-annotated multiword phrases is 4640.

Together, the first version of the treebank consisted of 6555 sentiment-annotated phrases from the parse trees of 1200 sentences. The resource described is the first freely available corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in the Polish language.

==== Treebank Wydzwieku: version 2.0 (aug 2018 ) ====

As we have added many sentences, we have released a 2.0 version of the treebank! It contains following new parts:
 * test sentences from PolEval 2017 sentiment task
 * 2 x 500 sentences collected from various sources on the web, mostly difficult, mixed sentiments and negative

==== Download ====

Releases:
 * [[http://zil.ipipan.waw.pl/TreebankWydzwieku?action=AttachFile&do=get&target=TreebankWydzwieku01.zip| version 1.0 ]]
 * [[http://zil.ipipan.waw.pl/TreebankWydzwieku?action=AttachFile&do=get&target=TreebankWydzwieku2.0.tar.gz| version 2.0 [new!] ]]


==== Have questions or ideas? ====

Please contact me: axw at ipipan dot ...

This is the home page of the Polish Sentiment Treebank

The dataset is a dependency treebank with sentiment annotations. It was parsed using the Polish dependency parser models available from http://zil.ipipan.waw.pl/PolishDependencyParser.

For each sentence in the treebank, sentiment of each sub-phrase (sub-tree) has been assigned by a linguist. Sentiment of each leaf word has been labelled according to Polish sentiment dictionary, an extension of http://zil.ipipan.waw.pl/SlownikWydzwieku/ and also verified manually.

Sentiment labels of both phrases and leaves include three classes: neutral, positive and negative.

The treebank has been created specifically for the purpose of analysing compositional sentiment effects in Polish language.

Treebank Wydzwieku: version 1.0 (initial)

  • The first part of the treebank is composed from the sub-part of Skladnica treebank, namely from sentences that contain at least one sentiment-bearing word. This part consists of 235 sentences (1915 sentiment-annotated multiword phrases).
  • The second part consits of 965 sentences from a product review corpus available from http://zil.ipipan.waw.pl/OPTA/. The number of sentiment-annotated multiword phrases is 4640.

Together, the first version of the treebank consisted of 6555 sentiment-annotated phrases from the parse trees of 1200 sentences. The resource described is the first freely available corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in the Polish language.

Treebank Wydzwieku: version 2.0 (aug 2018 )

As we have added many sentences, we have released a 2.0 version of the treebank! It contains following new parts:

  • test sentences from PolEval 2017 sentiment task

  • 2 x 500 sentences collected from various sources on the web, mostly difficult, mixed sentiments and negative

Download

Releases:

Have questions or ideas?

Please contact me: axw at ipipan dot ...