Triggers for Polish Named Entities
PNET (Polish Named Entity Triggers) is an electronic lexicon containing partly inflected external or internal evidences, or trigger words, for Polish named entities (NEs). An external or an internal NE evidence is a word or a list of words which appears frequently in the vicinity or inside named entities and is a good indicator of these NEs' types. For instance aktor ('actor') is an external evidence for person names (as in aktor [Zbigniew Buczkowski]), while von is an internal evidence for the same type ([John von Neumann]). Many words can be both external and internal evidences, e.g. jezioro ('lake') is a external evidence in jezioro [Mamry] ('[Mamry] lake') and an internal evidence in Jezioro Białe ('[White Lake]'). External and internal NE evidences can be used in automatic NE recognition via grammar-based or machine-learning methods.
The list has been created semi-automatically from a subset of Polish Wikipedia whose infobox types and categories were manually mapped on the NKJP NE typology.
The list contains over 28,000 inflected forms and over 1,500 lemmas.
- Małgorzata Baron - lexicography
- Leszek Manicki - automatic extraction
Agata Savary - inflection and validation
The resource is available under the 2-clause BSD licence.
Archive containing the annotated triggers and their description.
Morphological description of trigger words unknown to PoliMorf 0.6.1. (e.g. agromiasteczko).
Morpho-syntacic description of multi-word triggers (e.g. obwód autonomiczny).