Locked History Actions

Diff for "PolishSummariesCorpus"

Differences between revisions 22 and 23
Revision 22 as of 2021-04-21 10:30:20
Size: 2890
Comment:
Revision 23 as of 2021-04-21 17:44:54
Size: 3554
Editor: MateuszKopec
Comment:
Deletions are marked like this. Additions are marked like this.
Line 57: Line 57:
  "description":"Corpus of Polish news summaries.",   "description":"Corpus of Polish news summaries. A resource created to support the development and evaluation of the tools for
automated single-document summarization of Polish. The corpus contains a large number of manual summaries of news articles,
with many independently created summaries for a single text. Such approach is supposed to overcome the annotator bias, which is
often described as a problem during the evaluation of the summarization algorithms against a single gold standard. The corpus
includes both abstract free-word summaries, as well as extraction-based summaries created by selecting text spans from the original
document.",
Line 59: Line 64:
  "license":"https://creativecommons.org/licenses/by/3.0/",

logo

Polish Summaries Corpus

This page offers the official Creative Commons Attribution 3.0 Unported License release of the corpus of Polish news summaries, which creation was cofounded by the ATLAS project and by the European Union from resources of the European Social Fund -- Project PO KL „Information technologies: Research and their interdisciplinary applications”. By downloading the corpus data you accept the conditions of that licence.

Contact person: Mateusz Kopeć
License: CC BY v.3

http://i.creativecommons.org/l/by/3.0/88x31.png

Texts to summarize were extracted from http://www.cs.put.poznan.pl/dweiss/research/rzeczpospolita/ and are currently available on terms stated at that corpus webpage.

Documentation

Description of the corpus (in English).

Downloads

Preliminary version of the corpus is available to download under the following link:

There is a Java API to the corpus:

  • source code is available at git repository

  • Maven users may add following dependency:

<dependency>
  <groupId>pl.waw.ipipan.zil.summ</groupId>
  <artifactId>pscapi</artifactId>
  <version>1.0</version>
</dependency>

and repository:

<repository>
  <id>zil-maven-repo</id>
  <name>ZIL maven repository</name>
  <url>http://maven.nlp.ipipan.waw.pl/content/repositories/releases/</url>
</repository>   

Citing

When using Polish Summaries Corpus, please cite the following article: List of publications

Maciej Ogrodniczuk and Mateusz Kopeć. The Polish Summaries Corpus. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC 2014, pages 3712–3715, Reykjavík, Iceland, 2014. European Language Resources Association (ELRA).