Locked History Actions

Diff for "Automatic detection and correction of annotation errors in Polish language corpora"

Differences between revisions 3 and 7 (spanning 4 versions)
Revision 3 as of 2012-02-23 14:41:48
Size: 756
Comment:
Revision 7 as of 2013-09-16 09:25:10
Size: 1733
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
#acl +All:read #acl +All:read Default
Line 12: Line 12:

== Project summary ==
The main goals of the project are as follows: to improve the already known methods of automated detection of
annotation errors in text corpora (on the morpho-syntactic level), to develop an accurate method of such error detection for Polish language resources and
to provide an efficient tool, which may be used to automatically correct tagging errors in English and Polish corpora.

The quality of the low-level (morpho-syntactic) corpus annotation is crucial, as the annotation is used to train automated
taggers themselves. Often a gold-standard subcorpus is selected from a larger collection of documents and it serves as the
training material for taggers, which are then used to annotate the complete corpus. Precision of annotation in such a
subcorpus influences the tagging quality of the entire corpus and thus has a direct impact on the accuracy of other, higher
levels of text processing, e.g. semantic layers of annotation.

Automatic detection and correction of annotation errors in Polish language corpora

Project factsheet

English name:

Automatic detection and correction of annotation errors in Polish language corpora

Polish name:

Automatyczne wykrywanie i korekcja błędów anotacyjnych w polskich korpusach językowych

Project type:

A National Science Centre research grant (number 2011/01/N/ST6/01107)

Duration:

21 December 2011 ‒ 20 December 2013

Principal investigator:

Łukasz Kobyliński

Institution:

Institute of Computer Science, Polish Academy of Sciences

Project summary

The main goals of the project are as follows: to improve the already known methods of automated detection of annotation errors in text corpora (on the morpho-syntactic level), to develop an accurate method of such error detection for Polish language resources and to provide an efficient tool, which may be used to automatically correct tagging errors in English and Polish corpora.

The quality of the low-level (morpho-syntactic) corpus annotation is crucial, as the annotation is used to train automated taggers themselves. Often a gold-standard subcorpus is selected from a larger collection of documents and it serves as the training material for taggers, which are then used to annotate the complete corpus. Precision of annotation in such a subcorpus influences the tagging quality of the entire corpus and thus has a direct impact on the accuracy of other, higher levels of text processing, e.g. semantic layers of annotation.