Locked History Actions

Diff for "PNET"

Differences between revisions 3 and 5 (spanning 2 versions)
Revision 3 as of 2012-10-26 10:33:27
Size: 1578
Editor: AgataSavary
Comment:
Revision 5 as of 2012-10-26 10:39:30
Size: 1676
Editor: AgataSavary
Comment:
Deletions are marked like this. Additions are marked like this.
Line 20: Line 20:

 * [[attachment:PNET.tar.gz|Archive]] containing the annotated triggers and their description.

Triggers for Polish Named Entities

PNET (Polish Named Entity Triggers) contains a list of external and internal evidences (trigger words) of Polish named entities (NEs). An external or an internal NE evidence is a word or a list of words which appears frequently in the vicinity or inside named entities and is a good indicator of these NEs' types. For instance aktor ('actor') is an external evidence for person names (as in aktor [Zbigniew Buczkowski]), while von is an internal evidence for the same type ([John von Neumann]). Many words can be both external and internal evidences, e.g. jezioro ('lake') is a external evidence in jezioro [Mamry] ('[Mamry] lake') and an internal evidence in Jezioro Białe ('[White Lake]'). External and internal NE evidences can be used in automatic NE recognition via grammar-based or machine-learning methods.

The list has been created semi-automatically from a subset of Polish Wikipedia whose infobox types and categories were manually mapped on the NKJP NE typology.

The list contains over 28,000 inflected forms and over 1,500 lemmas.

Authors

  • Małgorzata Baron - lexicography
  • Leszek Manicki - automatic extraction
  • Agata Savary - inflection and validation

License

The resource is available under the 2-clause BSD licence.

Available resources

  • Archive containing the annotated triggers and their description.