Automatic detection and correction of annotation errors in Polish language corpora

Project factsheet

English name:	Automatic detection and correction of annotation errors in Polish language corpora
Polish name:	Automatyczne wykrywanie i korekcja błędów anotacyjnych w polskich korpusach językowych
Project type:	A National Science Centre research grant (number 2011/01/N/ST6/01107)
Duration:	21 December 2011 ‒ 20 December 2013
Principal investigator:	Łukasz Kobyliński
Institution:	Institute of Computer Science, Polish Academy of Sciences

Project summary

The main goals of the project are as follows: to improve the already known methods of automated detection of annotation errors in text corpora (on the morpho-syntactic level), to develop an accurate method of such error detection for Polish language resources and to provide an efficient tool, which may be used to automatically correct tagging errors in English and Polish corpora.

The quality of the low-level (morpho-syntactic) corpus annotation is crucial, as the annotation is used to train automated taggers themselves. Often a gold-standard subcorpus is selected from a larger collection of documents and it serves as the training material for taggers, which are then used to annotate the complete corpus. Precision of annotation in such a subcorpus influences the tagging quality of the entire corpus and thus has a direct impact on the accuracy of other, higher levels of text processing, e.g. semantic layers of annotation.

Automatic detection and correction of annotation errors in Polish language corpora

Menu

Automatic detection and correction of annotation errors in Polish language corpora

Project factsheet

Project summary