The Linguistic Engineering Group
The Linguistic Engineering (LE) Group is part of the Department of Artificial Intelligence at the Institute of Computer Science, Polish Academy of Sciences (ICS PAS).
People
Anna Andrzejczuk, MSc (on leave) |
|
Kacper Chwiałkowski (part time) |
|
Łukasz Degórski, MSc |
|
Elżbieta Hajnicz, PhD |
|
Łukasz Kobyliński, MSc |
|
Katarzyna Krasnowska (part time) |
|
Anna Kupść, PhD (on leave) |
|
Małgorzata Marciniak, PhD |
|
Marcin Miłkowski, PhD (part time) |
|
Agnieszka Mykowiecka, PhD |
|
Maciej Ogrodniczuk, PhD |
|
Jakub Piskorski, PhD, Associate |
|
Adam Przepiórkowski, PhD, Head of the Group |
|
Piotr Rychlik, PhD |
|
Tomek Strzałkowski, PhD, Foreign Associate |
|
Danuta Skowrońska, MSc |
|
Jan Szejko (part time) |
|
Łukasz Szałkiewicz, MSc |
|
Stan Szpakowicz, PhD, Foreign Associate |
|
Aleksander Wawer, MSc |
|
Aleksandra Wieczorek, MSc (part time) |
|
Marcin Woliński, PhD |
|
Alina Wróblewska, MSc |
|
Sebastian Żurowski, PhD (part time) |
Research
The main research areas of the Group
(Polish) corpus linguistics; cf. the IPI PAN Corpus of Polish and the National Corpus of Polish,
syntactic and semantic parsing of Polish; cf. Spejd and Świgra,
- extraction of linguistic knowledge from corpora,
- information extraction,
- sentiment analysis,
- morphosyntactic system of Polish,
- generative linguistic formalisms, esp., HPSG and LFG.
The Group is a member of CLARIN, FLaReNet and META-NET.
Current externally funded projects
CORE (Computer-based methods for coreference resolution in Polish texts),
NEKST (An adaptive system to support problem-solving on the basis of document collections in the Internet),
SYNAT (Creation of a universal, open repository platform for hosting and communication of networked resources of knowledge for science, education and open knowledge-based society),
ATLAS (Applied Technology for Language-Aided CMS),
CESAR (CEntral and South-east europeAn Resources).
Some of our past projects
Construction of a treebank for Polish using automatic syntactic analysis,
CLARIN (Common Language Resources and Technology Infrastructure),
NKJP (National Corpus of Polish),
Automatic detection of semantic dependencies within verb argument structures in large treebanks,
LUNA (spoken Language UNderstanding in multilinguAl communication systems) with the Polish support,
LT4eL (Language Technology for eLearning),
Automatic extraction of linguistic knowledge from a large corpus of Polish,
Publicly available tools and resources
Here are some of the tools and resources created within our projects. See [[|CLIP]] pages for a more exhaustive list of Polish tools and resources.
Tools (all open source, under GPL):
Świgra – a DCG parser,
Spejd – a shallow parsing and disambiguation system,
TaKIPI – a morphosyntactic tagger for Polish,
PANTERA – a morphosyntactic tagger for Polish,
Poliqarp – a corpus indexing and search engine,
Dendrarium – a treebank development system (under development),
Anotatornia – a system for multi-level manual annotation of corpora (forthcoming),
WSDDE – a system for designing and performing Word Sense Disambiguation experiments (forthcoming),
Resources:
IPI PAN Corpus of Polish (obsolete).
Other activities
Links to some other activities of the Group:
Intelligent Information Systems series of conferences.