Locked History Actions

Diff for "PNET"

Differences between revisions 5 and 9 (spanning 4 versions)
Revision 5 as of 2012-10-26 10:39:30
Size: 1676
Editor: AgataSavary
Comment:
Revision 9 as of 2012-10-26 10:59:00
Size: 1946
Editor: MichalLenart
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
#acl +All:read Default
Line 3: Line 4:
PNET (Polish Named Entity Triggers) contains a list of external and internal evidences (trigger words) of Polish named entities (NEs).
An external or an internal NE evidence is a word or a list of words which appears frequently in the vicinity or inside named entities and is a good indicator of these NEs' types. For instance ''aktor'' ('actor') is an external evidence for person names (as in ''aktor [Zbigniew Buczkowski]''), while ''von'' is an internal evidence for the same type (''[John von Neumann]''). Many words can be both external and internal evidences, e.g. ''jezioro'' ('lake') is a external evidence in ''jezioro [Mamry]'' ('[Mamry] lake') and an internal evidence in ''Jezioro Białe'' ('[White Lake]'). External and internal NE evidences can be used in automatic NE recognition via grammar-based or machine-learning methods.
PNET (Polish Named Entity Triggers) is an electronic lexicon containing partly inflected external or internal evidences, or trigger words, for Polish named entities (NEs). An external or an internal NE evidence is a word or a list of words which appears frequently in the vicinity or inside named entities and is a good indicator of these NEs' types. For instance ''aktor'' ('actor') is an external evidence for person names (as in ''aktor [Zbigniew Buczkowski]''), while ''von'' is an internal evidence for the same type (''[John von Neumann]''). Many words can be both external and internal evidences, e.g. ''jezioro'' ('lake') is a external evidence in ''jezioro [Mamry]'' ('[Mamry] lake') and an internal evidence in ''Jezioro Białe'' ('[White Lake]'). External and internal NE evidences can be used in automatic NE recognition via grammar-based or machine-learning methods.
Line 22: Line 22:

== Future work ==

 * Morphological description of trigger words unknown to PoliMorf 0.6.1. (e.g. ''agromiasteczko'').
 * Morpho-syntacic description of multi-word triggers (e.g. ''obwód autonomiczny'').

Triggers for Polish Named Entities

PNET (Polish Named Entity Triggers) is an electronic lexicon containing partly inflected external or internal evidences, or trigger words, for Polish named entities (NEs). An external or an internal NE evidence is a word or a list of words which appears frequently in the vicinity or inside named entities and is a good indicator of these NEs' types. For instance aktor ('actor') is an external evidence for person names (as in aktor [Zbigniew Buczkowski]), while von is an internal evidence for the same type ([John von Neumann]). Many words can be both external and internal evidences, e.g. jezioro ('lake') is a external evidence in jezioro [Mamry] ('[Mamry] lake') and an internal evidence in Jezioro Białe ('[White Lake]'). External and internal NE evidences can be used in automatic NE recognition via grammar-based or machine-learning methods.

The list has been created semi-automatically from a subset of Polish Wikipedia whose infobox types and categories were manually mapped on the NKJP NE typology.

The list contains over 28,000 inflected forms and over 1,500 lemmas.

Authors

  • Małgorzata Baron - lexicography
  • Leszek Manicki - automatic extraction
  • Agata Savary - inflection and validation

License

The resource is available under the 2-clause BSD licence.

Available resources

  • Archive containing the annotated triggers and their description.

Future work

  • Morphological description of trigger words unknown to PoliMorf 0.6.1. (e.g. agromiasteczko).

  • Morpho-syntacic description of multi-word triggers (e.g. obwód autonomiczny).