Locked History Actions


Polish Coreference Corpus Converters

This page offers the official Creative Commons Attribution 3.0 Unported License release of several data format converters used during creation of the Polish Coreference Corpus. By downloading them you accept the conditions of that licence.

Principal developer: Mateusz Kopeć
Authors: Mateusz Kopeć
License: CC BY v.3


There are 3 converters:

  • mmax2tei - converting a corpus in MMAX PCC format to TEI PCC format
  • tei2brat - converting a corpus in TEI PCC format to brat PCC format
  • tei2mmax - converting a corpus in TEI PCC format to MMAX PCC format


Each converter may be run using following command:

java -jar converter_standalone_jar input_dir target_dir

where converter_standalone_jar is one of the standalone .jar files presented below, input_dir is the corpus in the source format of the converter and target_dir is the target directory for the corpus after conversion.


mmax2tei 1.0 is available to download in two versions:

tei2brat 1.0 is available to download in two versions:

tei2mmax 1.0 is available to download in two versions:

Compiling mmax2tei or tei2mmax requires having mmaxAPI library, available to download in two versions:

You may also want to see other Polish Coreference Tools.


When using any of the converters, please cite the following article: List of publications

Maciej Ogrodniczuk, Katarzyna Głowińska, Mateusz Kopeć, Agata Savary, and Magdalena Zawisławska. Polish Coreference Corpus. In Zygmunt Vetulani, editor, Proceedings of the 6th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, pages 494–498, Poznań, Poland, 2013. Wydawnictwo Poznańskie, Fundacja Uniwersytetu im. Adama Mickiewicza.