Size: 2107
Comment:
|
Size: 2311
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 12: | Line 12: |
===== Treebank Wydzwieku: version 1.0 [initial ] ===== | ==== Treebank Wydzwieku: version 1.0 (initial) ==== |
Line 14: | Line 14: |
* The first part of the treebank is composed from the sub-part of Skladnica treebank, namely from sentences that contain at least one sentiment-bearing word. This part consists of 235 sentences (1915 sentiment-annotated multiword phrases). * The second part consits of 965 sentences from a product review corpus available from http://zil.ipipan.waw.pl/OPTA/. The number of sentiment-annotated multiword phrases is 4640. |
* The first part of the treebank is composed from the sub-part of Skladnica treebank, namely from sentences that contain at least one sentiment-bearing word. This part consists of 235 sentences (1915 sentiment-annotated multiword phrases). * The second part consits of 965 sentences from a product review corpus available from http://zil.ipipan.waw.pl/OPTA/. The number of sentiment-annotated multiword phrases is 4640. |
Line 19: | Line 19: |
===== Treebank Wydzwieku: version 2.0 [ released in August 2018 ] ===== | ==== Treebank Wydzwieku: version 2.0 (aug 2018 ) ==== |
Line 23: | Line 23: |
* 1000 new sentences collected from various sources on the web, mostly difficult, mixed sentiments and negative | * 2 x 500 sentences collected from various sources on the web, mostly difficult, mixed sentiments and negative |
Line 25: | Line 25: |
===== Download ===== | ==== Download ==== |
Line 27: | Line 27: |
[[attachment:TreebankWydzwieku01.zip| version 1.0 ]]. [[attachment:TreebankWydzwieku2.0.tar.gz| version 2.0 [new!] ]]. |
Releases: * [[http://zil.ipipan.waw.pl/TreebankWydzwieku?action=AttachFile&do=get&target=TreebankWydzwieku01.zip| version 1.0 ]] * [[http://zil.ipipan.waw.pl/TreebankWydzwieku?action=AttachFile&do=get&target=TreebankWydzwieku2.0.tar.gz| version 2.0 [new!] ]] ==== Have questions or ideas? ==== Please contact me: axw at ipipan dot ... |
This is the home page of the Polish Sentiment Treebank
The dataset is a dependency treebank with sentiment annotations. It was parsed using the Polish dependency parser models available from http://zil.ipipan.waw.pl/PolishDependencyParser.
For each sentence in the treebank, sentiment of each sub-phrase (sub-tree) has been assigned by a linguist. Sentiment of each leaf word has been labelled according to Polish sentiment dictionary, an extension of http://zil.ipipan.waw.pl/SlownikWydzwieku/ and also verified manually.
Sentiment labels of both phrases and leaves include three classes: neutral, positive and negative.
The treebank has been created specifically for the purpose of analysing compositional sentiment effects in Polish language.
Treebank Wydzwieku: version 1.0 (initial)
- The first part of the treebank is composed from the sub-part of Skladnica treebank, namely from sentences that contain at least one sentiment-bearing word. This part consists of 235 sentences (1915 sentiment-annotated multiword phrases).
The second part consits of 965 sentences from a product review corpus available from http://zil.ipipan.waw.pl/OPTA/. The number of sentiment-annotated multiword phrases is 4640.
Together, the first version of the treebank consisted of 6555 sentiment-annotated phrases from the parse trees of 1200 sentences. The resource described is the first freely available corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in the Polish language.
Treebank Wydzwieku: version 2.0 (aug 2018 )
As we have added many sentences, we have released a 2.0 version of the treebank! It contains following new parts:
test sentences from PolEval 2017 sentiment task
- 2 x 500 sentences collected from various sources on the web, mostly difficult, mixed sentiments and negative
Download
Releases:
Have questions or ideas?
Please contact me: axw at ipipan dot ...