Locked History Actions

Diff for "SEJF"

Differences between revisions 18 and 36 (spanning 18 versions)
Revision 18 as of 2012-07-23 11:23:28
Size: 3376
Editor: AgataSavary
Comment:
Revision 36 as of 2021-04-28 09:58:19
Size: 5087
Editor: AgataSavary
Comment:
Deletions are marked like this. Additions are marked like this.
Line 4: Line 4:
The Grammatical Lexicon of Polish Phraseology (SEJF = Słownik Elektroniczny Języka polskiego dla wyrażeń Frazeologicznych) is an electronic lexicon containing multi-word units (mainly nominal, adjectival and adverbial compounds) of the general (non terminological) Polish language. It has been created within the ERDF [[http://zil.ipipan.waw.pl/NEKST|Nekst]] project. The Grammatical Lexicon of Polish Phraseology (SEJF = Słownik Elektroniczny Jednostek Frazeologicznych) is an electronic lexicon containing multi-word units (mainly nominal, adjectival and adverbial compounds) of the general (non terminological) Polish language. It has been created within the ERDF [[http://zil.ipipan.waw.pl/NEKST|Nekst]] project and the IC1207 COST action [[http://www.parseme.eu|PARSEME]].
Line 9: Line 9:
 * CZEREPOWICKA, M., KOSEK, I. (2011): ''Problemy opisu związków frazeologicznych w formalizmie „Multifleks” (na przykładzie rodzaju wyrażeń frazeologicznych)'', in "Różne formy, różne treści", pp. 117–126, Warszawa 2011.  * CZEREPOWICKA, M., KOSEK, I. (2011): ''Problemy opisu związków frazeologicznych w formalizmie „Multifleks” (na przykładzie rodzaju wyrażeń frazeologicznych)'', in Kopcińska, D., Bańko, M. (eds.) "Różne formy, różne treści", pp. 117–126, Warszawa 2011.
Line 11: Line 11:
 * CZEREPOWICKA, M. (2014): ''Jednostki obce w słowniku języka polskiego na przykładzie „Słownika elektronicznego jednostek frazeologicznych” (SEJF)'', in [[http://www.akademicka.pl/index.php?detale=1&e=1&a=1&id=100000087|LingVaria (IX), vol. 1 (17)]], pp. 59-68 [doi: 10.12797/LV.09.2014.17.04].
 * CZEREPOWICKA, M. (2014): ''SEJF - Słownik elektroniczny jednostek frazeologicznych'', in [[http://www.jezyk-polski.pl|Język Polski]] (XCIV), v. 2, pp. 116-129.
 * CZEREPOWICKA, M., SAVARY, A., (2018) ''[[https://link.springer.com/chapter/10.1007/978-3-319-93782-3_5|SEJF - A Grammatical Lexicon of Polish Multiword Expressions]]'', In: Vetulani Z., Mariani J., Kubis M. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2015. Lecture Notes in Computer Science, vol 10930. Springer, Cham.
Line 12: Line 15:
The lexicon contains about 5,000 multi-word lexemes, 93,000 corresponding inflected forms, and 160 graph-based inflection paradigms, with the following distribution:
 * 2121 nominal compounds (e.g. ''bajońskie sumy''),
 * 446 adjectival compounds (e.g. ''prosty jak strzała'', ''wprost proporcjonalny''),
 * 604 adverbial compounds (e.g. ''chcąc nie chcąc''),
 * 43
others (e.g. ''ni z gruszki, ni z pietruszki'').
The lexicon contains about 3200 multi-word lexemes, 68,000 corresponding inflected forms, and 160 graph-based inflection paradigms, with the following distribution:
 * 3,705 nominal compounds (e.g. ''bajońskie sumy''),
 * 4
22 adjectival compounds (e.g. ''prosty jak strzała'', ''wprost proporcjonalny''),
 * 609 adver
bial compounds (e.g. ''chcąc nie chcąc''),
 * 40 others (e.g. ''ni z gruszki, ni z pietruszki'').
Line 34: Line 37:
 * [[attachment:Slownik.zip|Slownik]] -- the binary source file in [[http://zil.ipipan.waw.pl/Toposlaw|Toposław]] format
 * [[http://www.springerlink.com/content/n265j22n73084433/|Multiflex]]-compatible [[attachment:SEJF.zip|archive]] containing:
   * the list of morphologically annotated lexemes,
   * the list of corresponding inflected forms and variants,
   * inflection graphs compatible with [[http://igm.univ-mlv.fr/~unitex/|Unitex]] graph editor,
   * list of known problems.
 * SEJF version 1.1
  
* [[attachment:SEJF-1.1-Slownik.tar.gz|Slownik]] -- the binary source file in [[http://zil.ipipan.waw.pl/Toposlaw|Toposław]] format
   * [[http://www.springerlink.com/content/n265j22n73084433/|Multiflex]]-compatible [[attachment:SEJF-1.1.tar.gz|archive]] including:
     * the list of morphologically annotated lexemes,
     * the list of corresponding inflected forms and variants,
   * inflection graphs compatible with [[http://igm.univ-mlv.fr/~unitex/|Unitex]] graph editor,
     * list of known problems,
     * a README.txt file.

 * SEJF version 1.0 - 3200 multi-word lexemes (2121 nominal, 446 adjectival, 604 adverbial, 43 others), 68,000 corresponding inflected forms, and 160 graph-based inflection paradigms
   * [[attachment:Slownik.tar.gz|Slownik]] -- the binary source file in [[http://zil.ipipan.waw.pl/Toposlaw|Toposław]] format
   * [[http://www.springerlink.com/content/n265j22n73084433/|Multiflex]]-compatible [[attachment:SEJF.tar.gz|archive]] including:
     * the list of morphologically annotated lexemes,
     * the list of corresponding inflected forms and variants,
     * inflection graphs compatible with [[http://igm.univ-mlv.fr/~unitex/|Unitex]] graph editor,
     * list of known problems,
     * a README.txt file.

Grammatical Lexicon of Polish Phraseology

The Grammatical Lexicon of Polish Phraseology (SEJF = Słownik Elektroniczny Jednostek Frazeologicznych) is an electronic lexicon containing multi-word units (mainly nominal, adjectival and adverbial compounds) of the general (non terminological) Polish language. It has been created within the ERDF Nekst project and the IC1207 COST action PARSEME.

Some aspects of its construction, contents and use have been described in:

  • GRALIŃSKI, F., SAVARY, A., CZEREPOWICKA, M., MAKOWIECKI, F. (2010): Computational Lexicography of Multi-Word Units: How Efficient Can It Be?, in Proceedings of Multiword Expressions: from Theory to Applications (MWE 2010), Workshop at COLING 2010, Beijing, China, August 28.

  • CZEREPOWICKA, M., KOSEK, I. (2011): Problemy opisu związków frazeologicznych w formalizmie „Multifleks” (na przykładzie rodzaju wyrażeń frazeologicznych), in Kopcińska, D., Bańko, M. (eds.) "Różne formy, różne treści", pp. 117–126, Warszawa 2011.

  • CZEREPOWICKA, M. (2011): „Toposław” jako narzędzie znakowania jednostek wieloczłonowych, in Matusiak-Kempa, I., Przybyszewski, S. (eds.) Nowe zjawiska w języku, tekście, komunikacji. Kontekst a komunikacja, Olsztyn, pp. 28–35.

  • CZEREPOWICKA, M. (2014): Jednostki obce w słowniku języka polskiego na przykładzie „Słownika elektronicznego jednostek frazeologicznych” (SEJF), in LingVaria (IX), vol. 1 (17), pp. 59-68 [doi: 10.12797/LV.09.2014.17.04].

  • CZEREPOWICKA, M. (2014): SEJF - Słownik elektroniczny jednostek frazeologicznych, in Język Polski (XCIV), v. 2, pp. 116-129.

  • CZEREPOWICKA, M., SAVARY, A., (2018) SEJF - A Grammatical Lexicon of Polish Multiword Expressions, In: Vetulani Z., Mariani J., Kubis M. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2015. Lecture Notes in Computer Science, vol 10930. Springer, Cham.

The lexicon contains about 3200 multi-word lexemes, 68,000 corresponding inflected forms, and 160 graph-based inflection paradigms, with the following distribution:

  • 3,705 nominal compounds (e.g. bajońskie sumy),

  • 422 adjectival compounds (e.g. prosty jak strzała, wprost proporcjonalny),

  • 609 adverbial compounds (e.g. chcąc nie chcąc),

  • 40 others (e.g. ni z gruszki, ni z pietruszki).

Authors

Tools

The lexicon has been created within Toposław, tool for developping and managing inflectional dictionaries of multi-word units. Toposław integrates:

  • Morfeusz SGJP -- a morphological analyser and generator of Polish,

  • Multiflex -- a morpho-syntactic generator of multi-word units,

  • graph editor stemming from Unitex.

License

The data are available under the CC BY-SA license.

Available resources

  • SEJF version 1.1
    • Slownik -- the binary source file in Toposław format

    • Multiflex-compatible archive including:

      • the list of morphologically annotated lexemes,
      • the list of corresponding inflected forms and variants,
      • inflection graphs compatible with Unitex graph editor,

      • list of known problems,
      • a README.txt file.
  • SEJF version 1.0 - 3200 multi-word lexemes (2121 nominal, 446 adjectival, 604 adverbial, 43 others), 68,000 corresponding inflected forms, and 160 graph-based inflection paradigms
    • Slownik -- the binary source file in Toposław format

    • Multiflex-compatible archive including:

      • the list of morphologically annotated lexemes,
      • the list of corresponding inflected forms and variants,
      • inflection graphs compatible with Unitex graph editor,

      • list of known problems,
      • a README.txt file.

Future work

Defining an LMF format for the lexicon.