Locked History Actions

Diff for "WikiTopoPl"

Differences between revisions 3 and 4
Revision 3 as of 2012-07-26 10:37:44
Size: 70
Editor: MichalLenart
Comment:
Revision 4 as of 2012-07-26 11:18:21
Size: 1511
Comment:
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:

The multilingual lexicon of toponyms (WikiTopoPl) contains a list of over 155,000 Polish geographical proper names (countries, cities, regions, hydronyms, etc) and their equivalents in Bulgarian, German, modern Greek, English and Romanian. These data (whenever available) have been automatically extracted from the open encyclopedia Wikipedia. The Wikipedia categories attached to the lexicon entries have been mapped to a short list of succinct categories compliant with Prolexbase, a multilingual ontology of proper names.

The lexicon contains translations of over 155,000 Polish geographical proper names as follows:
 * over 8,000 Bulgarian translations,
 * over 43,000 German translations,
 * over 3,000 modern Greek translations,
 * over 155,000 English translations,
 * over 19,000 Romanian translations.

== Authors ==
 * Leszek Manicki

== License ==
The lexicon is available under the [[http://creativecommons.org/licenses/by-sa/3.0/|Creative Commons Attribution-Sharealike 3.0 Unported License (CC-BY-SA)]].

== Available resources ==
 * [[http://bach.ipipan.waw.pl/redmine/attachments/116/wikipedia_translations.prolexbase_types.wikipedia_types.no_buildings.txt.bz2|lexicon as a text file]] (1.4 MB)

== Future work ==
Expanding the lexicon by:
 * using the newer Wikipedia dump,
 * adding new languages to the lexicon,
 * expanding the set of Wikipedia article categories to be included in the lexicon.

WikiTopoPl

The multilingual lexicon of toponyms (WikiTopoPl) contains a list of over 155,000 Polish geographical proper names (countries, cities, regions, hydronyms, etc) and their equivalents in Bulgarian, German, modern Greek, English and Romanian. These data (whenever available) have been automatically extracted from the open encyclopedia Wikipedia. The Wikipedia categories attached to the lexicon entries have been mapped to a short list of succinct categories compliant with Prolexbase, a multilingual ontology of proper names.

The lexicon contains translations of over 155,000 Polish geographical proper names as follows:

  • over 8,000 Bulgarian translations,
  • over 43,000 German translations,
  • over 3,000 modern Greek translations,
  • over 155,000 English translations,
  • over 19,000 Romanian translations.

Authors

  • Leszek Manicki

License

The lexicon is available under the Creative Commons Attribution-Sharealike 3.0 Unported License (CC-BY-SA).

Available resources

Future work

Expanding the lexicon by:

  • using the newer Wikipedia dump,
  • adding new languages to the lexicon,
  • expanding the set of Wikipedia article categories to be included in the lexicon.