Revision 7 as of 2012-10-26 10:48:31

Clear message
Locked History Actions

PNET

Triggers for Polish Named Entities

PNET (Polish Named Entity Triggers) contains a list of external and internal evidences (trigger words) of Polish named entities (NEs). An external or an internal NE evidence is a word or a list of words which appears frequently in the vicinity or inside named entities and is a good indicator of these NEs' types. For instance aktor ('actor') is an external evidence for person names (as in aktor [Zbigniew Buczkowski]), while von is an internal evidence for the same type ([John von Neumann]). Many words can be both external and internal evidences, e.g. jezioro ('lake') is a external evidence in jezioro [Mamry] ('[Mamry] lake') and an internal evidence in Jezioro Białe ('[White Lake]'). External and internal NE evidences can be used in automatic NE recognition via grammar-based or machine-learning methods.

The list has been created semi-automatically from a subset of Polish Wikipedia whose infobox types and categories were manually mapped on the NKJP NE typology.

The list contains over 28,000 inflected forms and over 1,500 lemmas.

Authors

  • Małgorzata Baron - lexicography
  • Leszek Manicki - automatic extraction
  • Agata Savary - inflection and validation

License

The resource is available under the 2-clause BSD licence.

Available resources

  • Archive containing the annotated triggers and their description.

Future work

  • Morphological description of trigger words unknown to PoliMorf 0.6.1. (e.g. agromiasteczko).

  • Morpho-syntacic description of multi-word triggers (e.g. obwód autonomiczny).