Differences between revisions 34 and 35

Prolexbase 2.0

Prolexbase 2.0 is a multilingual relational dictionary of proper names, conceived initially at the University of Tours, France and at the University of Belgrade, Serbia, and further developed at the Polish Academy of Sciences (IPIPAN). It contains a language-independent typology of proper names with 4 supertypes and 34 types, as well as various language-independent or language-specific relations (synonymy, meronymy accessibility, variation etc.). A pivot-oriented design of concepts yields alignment of proper names in a language with their counterparts if other languages. A large majority of the data have been extracted from Wikipedia and GeoNames. All data have been manually validated.

Prolexbase creation has been supported by the following projects:

Technolangue programme from the French Ministry of Industry (2003-2005),
Egide Pavle-Savic programme from the French Ministry of Foreign Affairs, the French Ministry of Research, and the Serbian Ministry of Science (2004-2005),
ERDF Nekst project (2009-2014),
European (CIP ICT-PSP) CESAR project, part of META-NET (2011-2013).

The construction and contents of Prolexbase have been described in:

Savary, A., Manicki, L., Baron, M.: ProlexFeeder— Populating a Multilingual Ontology of Proper Names from Open Sources. Submitted to Journal of Language Modelling.
Bouchou, B., Maurel, D. (2013): Prolmf, a Multilingual Dictionary of Proper Names and their Relations. In Gil Francopoulo (ed.), LMF: Lexical Markup Framework, theory and practice, Hermes-science, to appear.
Spędzia, M., Maurel, D., Savary, A. (2011): Multilingual Relational Database of Proper Names: Prolexbase Documentation. Technical report #297, Laboratoire d'informatique, Université François Rabelais Tours.
Bouchou, B., Maurel, D. (2008): Prolexbase et LMF : vers un standard pour les ressources lexicales sur les noms propres. In Traitement Automatique des Langues, 49(1).
Maurel, D. (2008): Prolexbase: a Multilingual Relational Lexical Database of Proper Names. In proceedings of LREC 2008, Marrakech, Morocco.
Tran, M., Maurel, D. (2006): Prolexbase. Un dictionnaire relationnel multilingue de noms propres. In Traitement Automatique des Langues, 47(3).
Krstev S., Vitas D., Maurel D., Tran M. (2005): Multilingual Ontology of Proper Names. In Second Language & Technology Conference (LTC'05), Poznań, Poland.

Prolexbase 2.0 contains the following interlinked data:

67,000 languge-independent pivots,
40,000 Polish proper names and their corresponding 165,000 inflected forms,
33,000 English proper names and their corresponding 18,000 inflected forms,
100,000 French proper names and their corresponding 142,000 inflected forms,
65,500 relations.

See also Prolexbase on CNRTL for a previous version of the French data, serialized in an LMF standard format.

Authors

Małgorzata Baron - lexicography,
Béatrice Bouchou Markhoff - LMF format design,
Leszek Manicki - design and implementation of ProlexFeeder (Prolexbase population from Wikipedia),
Denis Maurel - design and dissemination, project management,
Agata Savary – project management,
Mickaël Tran - database design and implementation.
Duško Vitas - design and management of Serbian data.

Tools

ProlexFeeder, a tool for semi-automatic population of Prolexbase from open sources, notably Wikipedia and Geonames,
Translatica's automatic inflection tool for multi-word units.

License

All Prolexbase 2.0 data are available under the CC BY-SA license, i.e. the same as for Wikipedia and GeoNames.

Available resources

Prolexbase documentation.
Prolexbase schema description.
MySQL export file (to appear),
List of all inflected forms of Polish names together with their semantic and grammatical tags (to appear).
XML Schema for LMF serialisation.

Future work

Exporting Prolexbase 2.0 to LMF.
Opening a web interface for Prolexbase 2.0 navigation.
Releasing ProlexFeeder under an open license.

-  ⇤ ← Revision 34 as of 2013-01-23 12:01:54 → 
  Size: 5378
  Editor: AgataSavary
  Comment:
+   ← Revision 35 as of 2013-01-23 12:02:13 → ⇥
  Size: 5362
  Editor: AgataSavary
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 21:
-Prolexbase 2.0 of the resource contains the following interlinked data:
+Prolexbase 2.0 contains the following interlinked data:

Diff for "Prolexbase"

Menu