Locked History Actions

Diff for "PNET"

Differences between revisions 1 and 7 (spanning 6 versions)
Revision 1 as of 2012-10-26 10:14:41
Size: 21
Editor: AgataSavary
Comment:
Revision 7 as of 2012-10-26 10:48:31
Size: 1887
Editor: AgataSavary
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
Describe PNET here. = Triggers for Polish Named Entities =

PNET (Polish Named Entity Triggers) contains a list of external and internal evidences (trigger words) of Polish named entities (NEs).
An external or an internal NE evidence is a word or a list of words which appears frequently in the vicinity or inside named entities and is a good indicator of these NEs' types. For instance ''aktor'' ('actor') is an external evidence for person names (as in ''aktor [Zbigniew Buczkowski]''), while ''von'' is an internal evidence for the same type (''[John von Neumann]''). Many words can be both external and internal evidences, e.g. ''jezioro'' ('lake') is a external evidence in ''jezioro [Mamry]'' ('[Mamry] lake') and an internal evidence in ''Jezioro Białe'' ('[White Lake]'). External and internal NE evidences can be used in automatic NE recognition via grammar-based or machine-learning methods.

The list has been created semi-automatically from a subset of Polish Wikipedia whose infobox types and categories were manually mapped on the [[http://nkjp.pl|NKJP]] NE typology.

The list contains over 28,000 inflected forms and over 1,500 lemmas.

== Authors ==
 * Małgorzata Baron - lexicography
 * Leszek Manicki - automatic extraction
 * [[http://www.info.univ-tours.fr/~savary/English/indexgb.html|Agata Savary]] - inflection and validation

== License ==

The resource is available under the [[http://en.wikipedia.org/wiki/BSD_licenses#2-clause_license_.28.22Simplified_BSD_License.22_or_.22FreeBSD_License.22.29|2-clause BSD licence]].

== Available resources ==

 * [[attachment:PNET.tar.gz|Archive]] containing the annotated triggers and their description.

== Future work ==

 * Morphological description of trigger words unknown to PoliMorf 0.6.1. (e.g. ''agromiasteczko'').
 * Morpho-syntacic description of multi-word triggers (e.g. ''obwód autonomiczny'').

Triggers for Polish Named Entities

PNET (Polish Named Entity Triggers) contains a list of external and internal evidences (trigger words) of Polish named entities (NEs). An external or an internal NE evidence is a word or a list of words which appears frequently in the vicinity or inside named entities and is a good indicator of these NEs' types. For instance aktor ('actor') is an external evidence for person names (as in aktor [Zbigniew Buczkowski]), while von is an internal evidence for the same type ([John von Neumann]). Many words can be both external and internal evidences, e.g. jezioro ('lake') is a external evidence in jezioro [Mamry] ('[Mamry] lake') and an internal evidence in Jezioro Białe ('[White Lake]'). External and internal NE evidences can be used in automatic NE recognition via grammar-based or machine-learning methods.

The list has been created semi-automatically from a subset of Polish Wikipedia whose infobox types and categories were manually mapped on the NKJP NE typology.

The list contains over 28,000 inflected forms and over 1,500 lemmas.

Authors

  • Małgorzata Baron - lexicography
  • Leszek Manicki - automatic extraction
  • Agata Savary - inflection and validation

License

The resource is available under the 2-clause BSD licence.

Available resources

  • Archive containing the annotated triggers and their description.

Future work

  • Morphological description of trigger words unknown to PoliMorf 0.6.1. (e.g. agromiasteczko).

  • Morpho-syntacic description of multi-word triggers (e.g. obwód autonomiczny).