Size: 1578
Comment:
|
Size: 1676
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 20: | Line 20: |
* [[attachment:PNET.tar.gz|Archive]] containing the annotated triggers and their description. |
Triggers for Polish Named Entities
PNET (Polish Named Entity Triggers) contains a list of external and internal evidences (trigger words) of Polish named entities (NEs). An external or an internal NE evidence is a word or a list of words which appears frequently in the vicinity or inside named entities and is a good indicator of these NEs' types. For instance aktor ('actor') is an external evidence for person names (as in aktor [Zbigniew Buczkowski]), while von is an internal evidence for the same type ([John von Neumann]). Many words can be both external and internal evidences, e.g. jezioro ('lake') is a external evidence in jezioro [Mamry] ('[Mamry] lake') and an internal evidence in Jezioro Białe ('[White Lake]'). External and internal NE evidences can be used in automatic NE recognition via grammar-based or machine-learning methods.
The list has been created semi-automatically from a subset of Polish Wikipedia whose infobox types and categories were manually mapped on the NKJP NE typology.
The list contains over 28,000 inflected forms and over 1,500 lemmas.
Authors
- Małgorzata Baron - lexicography
- Leszek Manicki - automatic extraction
Agata Savary - inflection and validation
License
The resource is available under the 2-clause BSD licence.
Available resources
Archive containing the annotated triggers and their description.