Size: 40
Comment:
|
Size: 1887
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 2: | Line 2: |
PNET (Polish Named Entity Triggers) contains a list of external and internal evidences (trigger words) of Polish named entities (NEs). An external or an internal NE evidence is a word or a list of words which appears frequently in the vicinity or inside named entities and is a good indicator of these NEs' types. For instance ''aktor'' ('actor') is an external evidence for person names (as in ''aktor [Zbigniew Buczkowski]''), while ''von'' is an internal evidence for the same type (''[John von Neumann]''). Many words can be both external and internal evidences, e.g. ''jezioro'' ('lake') is a external evidence in ''jezioro [Mamry]'' ('[Mamry] lake') and an internal evidence in ''Jezioro Białe'' ('[White Lake]'). External and internal NE evidences can be used in automatic NE recognition via grammar-based or machine-learning methods. The list has been created semi-automatically from a subset of Polish Wikipedia whose infobox types and categories were manually mapped on the [[http://nkjp.pl|NKJP]] NE typology. The list contains over 28,000 inflected forms and over 1,500 lemmas. == Authors == * Małgorzata Baron - lexicography * Leszek Manicki - automatic extraction * [[http://www.info.univ-tours.fr/~savary/English/indexgb.html|Agata Savary]] - inflection and validation == License == The resource is available under the [[http://en.wikipedia.org/wiki/BSD_licenses#2-clause_license_.28.22Simplified_BSD_License.22_or_.22FreeBSD_License.22.29|2-clause BSD licence]]. == Available resources == * [[attachment:PNET.tar.gz|Archive]] containing the annotated triggers and their description. == Future work == * Morphological description of trigger words unknown to PoliMorf 0.6.1. (e.g. ''agromiasteczko''). * Morpho-syntacic description of multi-word triggers (e.g. ''obwód autonomiczny''). |
Triggers for Polish Named Entities
PNET (Polish Named Entity Triggers) contains a list of external and internal evidences (trigger words) of Polish named entities (NEs). An external or an internal NE evidence is a word or a list of words which appears frequently in the vicinity or inside named entities and is a good indicator of these NEs' types. For instance aktor ('actor') is an external evidence for person names (as in aktor [Zbigniew Buczkowski]), while von is an internal evidence for the same type ([John von Neumann]). Many words can be both external and internal evidences, e.g. jezioro ('lake') is a external evidence in jezioro [Mamry] ('[Mamry] lake') and an internal evidence in Jezioro Białe ('[White Lake]'). External and internal NE evidences can be used in automatic NE recognition via grammar-based or machine-learning methods.
The list has been created semi-automatically from a subset of Polish Wikipedia whose infobox types and categories were manually mapped on the NKJP NE typology.
The list contains over 28,000 inflected forms and over 1,500 lemmas.
Authors
- Małgorzata Baron - lexicography
- Leszek Manicki - automatic extraction
Agata Savary - inflection and validation
License
The resource is available under the 2-clause BSD licence.
Available resources
Archive containing the annotated triggers and their description.
Future work
Morphological description of trigger words unknown to PoliMorf 0.6.1. (e.g. agromiasteczko).
Morpho-syntacic description of multi-word triggers (e.g. obwód autonomiczny).