Locked History Actions

Diff for "Walenty"

Differences between revisions 2 and 25 (spanning 23 versions)
Revision 2 as of 2012-07-20 09:19:27
Size: 4394
Comment:
Revision 25 as of 2014-03-04 15:50:48
Size: 10834
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
#acl +All:read Default
Line 3: Line 4:
The Polish Valence Dictionary (Walenty) is an electronic dictionary of subcategorisation frames for 1438 Polish verbs and quasi-verbal predicates (2106 verbs if entries with the reflexive marker ''się'' as part of their base are counted separately). (See the bottom of this page for the latest versions of the dictionary.)
Line 5: Line 6:
The dictionary is an adaptation of the Syntactic Dictionary of Polish Verbs (Świdziński 1994) in a digitised version expanded by Witold Kieraś to include a number of frequent verbs missing from the original dictionary. The Polish Valence Dictionary (Walenty) is an electronic dictionary of subcategorisation frames for Polish verbs and quasi-verbal predicates. Some textual snapshots of the dictionary are made available at the bottom of this page, together with an article in Polish describing its format in detail. What follows is an overview, based on on earlier versions of the dictionary.
Line 9: Line 10:
The presented resource results from an automatic conversion of Świdziński's dictionary, manually reviewed to include correct information about new features, including sentential subjects, passivisation, and control relations. Additionally, sentential subjects listed in Świdziński 1992 have been included.

The resource has been produced as part of the CESAR project (Central and Southeast European Resources) and is made available on META-SHARE.
The resource has been produced as part of the CESAR project (Central and Southeast European Resources), as well as other projects carried out at [[http://zil.ipipan.waw.pl/|ZIL IPI PAN]], and is made available on META-SHARE.
Line 19: Line 18:
 * Phrases representing arguments restricted in terms of semantic categories which may be expressed by a wider scope of syntactic constructions ('adverbial' phrases) are represented as ''xp''.  * Phrases representing arguments restricted in terms of semantic categories which may be expressed by a wider scope of syntactic constructions ("adverbial" phrases) are represented as ''xp'' and classified into specific subtypes.
Line 25: Line 24:
 * Coordination of different types of arguments is marked by listing arguments within a single syntactic position
 * Information about implicit subjects and raised subjects is included
 * Idiomatic arguments and structures are included where they cannot be captured by more general valence frames

==== Entry structure ====

The dictionary, in text format, consists of a list of valence frames. Every frame is associated with a lemma, in the following format:

 * ''base form: aspect: frames''

The actual valence information is represented as a list of syntactic positions, expressed within curly braces and separated with plus signs. A position may include more than one type of argument if the arguments can be coordinated within the same position (arguments within a position are separated by semicolons).

Positions may bear special categories (e.g. subject, passivisable object), listed before the relevant position. The following categories are distinguished:

 * ''subj'': grammatical subject (including non-nominal subjects)
   * implicit subjects are marked as ''subj{E}''
   * subjects whose shape is transmitted from an embedded infinitival phrase are marked as ''subj,controller{E}''
 * ''obj'': passivisable object (regardless of case)
 * ''controller'', ''controlee'': control relations between arguments, playing a role in:
   * agreement between adjectival phrases and controller arguments
   * agreement between prepositional phrases involving ''jak'', ''jako'', ''niż'' and controllier arguments
   * establishing the controller subject of infinitival phrases

In the situation where different types of arguments may be coordinated within a single position, certain categories are only relevant for a subset of listed arguments:

  * possibility of passivisation (''obj'') applies to nominal and sentential arguments, but not infinitival ones (which may sometimes be coordinated with sentential arguments)
  * control (''controlee'') is not relevant for sentential arguments

Since the dictionary is syntactic in nature, only longest possible frames are listed - shorter frames are included within broader ones, regardless of differences in semantics, unless they differ in terms of control relations or the presence of the subject.

==== Types of arguments ====

Actual arguments listed in valence frames are categorised into following types, listed as argument(''parameter1,parameter2,...''):

 * np(''case''): nominal phrase
   * ''str'': structural case (nominative if subject, accusative/genitive of negation otherwise)
   * ''pred'': predicative case
   * ''part'': partitive case (accusative or partitive genitive)
 * adjp(''case''): adjectival phrase
 * prepnp(''preposition'',''case''): prepositional-nominal phrase
   * comparative conjunctions ''jak'', ''jako'' and ''niż'' are treated as prepositions governing the structural case
 * prepadjp(''preposition'',''case''): prepositional-adjectival phrase
   * ''postp'': postprepositional case
 * comprepnp(''complex preposition''): complex (i.e. multi-word) preposition
 * cp(''type''): sentential (complementiser) phrase
 * ncp(''case'',''type''): sentential phrase with a correlative pronoun
 * prepncp(''preposition'',''case'',''type''): sentential phrase with a prepositional phrase correlative
 * nonch: "nonchromatic" phrase (pronominal element not replaceable by a nominal phrase)
 * infp(''aspect''): infinitival phrase
   * ''dk'': perfect aspect
   * ''ndk'': imperfect aspect
 * xp(''category''): "adverbial" phrases involving semantic requirements (expressible through adverbs, prepositional phrases, or sentential phrases)
   * ''locat'': locative
   * ''abl'': ablative
   * ''adl'': adlative
   * ''perl'': perlative
   * ''temp'': temporal
   * ''dur'': durative
   * ''mod'': manner
 * advp(''category''): adverbial phrase expressible through adverbs only
   * ''pron'': anaphoric adverbs ''tak'' and ''jak''
   * ''misc'': adverbs of degree and evaluation
 * lexnp(''case'',''number'',''lemma'',''modification''): idiomatic nominal arguments with lexical restrictions
   * ''natr'': the structure of the argument cannot be expanded beyond the required form
   * ''atr'': the structure of the argument can be freely expanded
   * ''ratr'': the structure of the argument ''must'' be expanded (it occurs only with modifiers or complements)
   * ''batr'': the argument must be modified by a bound possessive pronoun (''własny''/''swój'')
 * preplexnp(''preposition'',''case'',''number'',''lemma'',''modification''): idiomatic arguments embedded within prepositional phrase
 * fixed(''string''): fixed expressions
 * or: ''oratio recta'', i.e. direct speech
 * refl: reflexive use marked through the word ''się''
 * E: implicit subject, transmitted subject when marked as controller
Line 30: Line 101:
 * Expanded electronic version of the Syntactic Dictionary of Polish Verbs, distributed with the Świgra parser (''[[http://zil.ipipan.waw.pl/Sk%C5%82adnica?action=AttachFile&do=view&target=S%C5%82ownik-walencyjny.txt.gz|Słownik walencyjny analizatora]]''), available at [[http://zil.ipipan.waw.pl/Składnica|Składnica]] page.
Line 33: Line 105:
The format of the dictionary has been devised by:
Line 34: Line 108:
 * Marek Świdziński
 * Witold Kieraś
Line 35: Line 111:
 * Agnieszka Patejuk
 * Adam Przepiórkowski
<<MailTo(Adam DOT Przepiorkowski AT SPAMFREE ipipan DOT waw DOT pl)>>
 * Marek Świdziński
 * Agnieszka Patejuk <<MailTo(aep AT SPAMFREE ipipan DOT waw DOT pl)>>
 * Adam Przepiórkowski <<MailTo(adamp AT SPAMFREE ipipan DOT waw DOT pl)>>
Line 39: Line 114:

Manual edition has been carried out by:

 * Filip Skwarski
 * Sebastian Żurowski
 * Jakub Szymczak

 * Piotr Batko
 * Joanna Filipczak
 * Marta Kalużna
 * Marcin Opacki
 * Paulina Rosalska
 * Maciej Zgondek
Line 46: Line 134:
 * [[attachment:polish_valence_dictionary.zip|Text version]] of the present release of the Polish Valence Dictionary The following text versions of the dictionary is available:

 * [[attachment:walenty_01_2013.zip|the January 2013 release of the Polish Valence Dictionary]] (30 January 2013),
 * [[attachment:walenty_09_2013.zip|the September 2013 release of the Polish Valence Dictionary]] (09 September 2013) – with a [[attachment:walenty.20130929.1114.pdf|draft paper]] (in Polish) describing the formalism used in this version and providing some quantitative information,
 * [[attachment:walenty_03_2014.zip|the March 2014 release of the Polish Valence Dictionary]] (03 March 2014).

The complete dictionary database, including corpus examples illustrating the use of individual frames, may also be viewed through the online application [[http://zil.ipipan.waw.pl/Slowal|Slowal]], which was used during the creation of the dictionary. Instructions for obtaining access to a guest account on the referenced page.

== Errors ==

Please report any errors in the dictionary by sending an e-mail to: "val DYWIZ err MAŁPKA chopin KROPKA ipipan KROPKA waw KROPKA pl" (omit spaces; DYWIZ = "-", MAŁPKA = "@", KROPKA = "."; so the real address is an expanded version of v-e@c.i.w.p; if you are a spambot and you've parsed that – chapeau bas).

Polish Valence Dictionary (Walenty)

(See the bottom of this page for the latest versions of the dictionary.)

The Polish Valence Dictionary (Walenty) is an electronic dictionary of subcategorisation frames for Polish verbs and quasi-verbal predicates. Some textual snapshots of the dictionary are made available at the bottom of this page, together with an article in Polish describing its format in detail. What follows is an overview, based on on earlier versions of the dictionary.

The dictionary represents valence as a list of individual frames describing a particular verbal base with a particular aspect (perfective, imperfective, or bi-aspectual, listed as _). The actual argument structure is presented as a set of positions which must be filled by phrases of appropriate types and parameters. Individual positions may be marked for their status as a subject (subj) or a passivisable direct object (obj), and for their role in control relations with other positions in the argument structure (controller and controlee).

The resource has been produced as part of the CESAR project (Central and Southeast European Resources), as well as other projects carried out at ZIL IPI PAN, and is made available on META-SHARE.

Format

The format of the dictionary (devised by the authors listed below) is based on the electronic version of Świdziński's dictionary, but includes a number of significant changes:

  • Arguments and their parameters consistently use Latin- or English-based terms.
  • Sentential phrases (sentp) are split into three categories based on the number of parameters they have (cp for bare complementiser clauses, ncp for complementiser clauses with a correlative pronoun, prepncp for prepositional phrases involving a complementiser clause with a correlative pronoun).

  • Phrases representing arguments restricted in terms of semantic categories which may be expressed by a wider scope of syntactic constructions ("adverbial" phrases) are represented as xp and classified into specific subtypes.

  • Multi-word prepositions are represented as comprepnp.

  • Case requirements listed as the nominative and the accusative case in Świdziński's dictionary are represented as the structural case (str) in order to capture their alternation with other cases (as opposed to e.g. a lexical accusative present in prepositional phrases). In the subject position, the structural case may be nominative (in noun phrases) or a case represented as the accusative in the LFG grammar (in non-agreeing numeral phrases). In other (object) positions, the structural cases represent the accusative case or the genitive case (when the predicate is negated).

  • Subjects positions are listed as subj. This includes sentential subjects.

  • Passivisable (direct) objects are listed as obj. This includes non-accusative direct objects.

  • Control relations are represented, marking the controller and the controlee arguments. Control relations are involved e.g. in establishing the source of agreement for adjectival arguments.

  • Coordination of different types of arguments is marked by listing arguments within a single syntactic position
  • Information about implicit subjects and raised subjects is included
  • Idiomatic arguments and structures are included where they cannot be captured by more general valence frames

Entry structure

The dictionary, in text format, consists of a list of valence frames. Every frame is associated with a lemma, in the following format:

  • base form: aspect: frames

The actual valence information is represented as a list of syntactic positions, expressed within curly braces and separated with plus signs. A position may include more than one type of argument if the arguments can be coordinated within the same position (arguments within a position are separated by semicolons).

Positions may bear special categories (e.g. subject, passivisable object), listed before the relevant position. The following categories are distinguished:

  • subj: grammatical subject (including non-nominal subjects)

    • implicit subjects are marked as subj{E}

    • subjects whose shape is transmitted from an embedded infinitival phrase are marked as subj,controller{E}

  • obj: passivisable object (regardless of case)

  • controller, controlee: control relations between arguments, playing a role in:

    • agreement between adjectival phrases and controller arguments
    • agreement between prepositional phrases involving jak, jako, niż and controllier arguments

    • establishing the controller subject of infinitival phrases

In the situation where different types of arguments may be coordinated within a single position, certain categories are only relevant for a subset of listed arguments:

  • possibility of passivisation (obj) applies to nominal and sentential arguments, but not infinitival ones (which may sometimes be coordinated with sentential arguments)

  • control (controlee) is not relevant for sentential arguments

Since the dictionary is syntactic in nature, only longest possible frames are listed - shorter frames are included within broader ones, regardless of differences in semantics, unless they differ in terms of control relations or the presence of the subject.

Types of arguments

Actual arguments listed in valence frames are categorised into following types, listed as argument(parameter1,parameter2,...):

  • np(case): nominal phrase

    • str: structural case (nominative if subject, accusative/genitive of negation otherwise)

    • pred: predicative case

    • part: partitive case (accusative or partitive genitive)

  • adjp(case): adjectival phrase

  • prepnp(preposition,case): prepositional-nominal phrase

    • comparative conjunctions jak, jako and niż are treated as prepositions governing the structural case

  • prepadjp(preposition,case): prepositional-adjectival phrase

    • postp: postprepositional case

  • comprepnp(complex preposition): complex (i.e. multi-word) preposition

  • cp(type): sentential (complementiser) phrase

  • ncp(case,type): sentential phrase with a correlative pronoun

  • prepncp(preposition,case,type): sentential phrase with a prepositional phrase correlative

  • nonch: "nonchromatic" phrase (pronominal element not replaceable by a nominal phrase)
  • infp(aspect): infinitival phrase

    • dk: perfect aspect

    • ndk: imperfect aspect

  • xp(category): "adverbial" phrases involving semantic requirements (expressible through adverbs, prepositional phrases, or sentential phrases)

    • locat: locative

    • abl: ablative

    • adl: adlative

    • perl: perlative

    • temp: temporal

    • dur: durative

    • mod: manner

  • advp(category): adverbial phrase expressible through adverbs only

    • pron: anaphoric adverbs tak and jak

    • misc: adverbs of degree and evaluation

  • lexnp(case,number,lemma,modification): idiomatic nominal arguments with lexical restrictions

    • natr: the structure of the argument cannot be expanded beyond the required form

    • atr: the structure of the argument can be freely expanded

    • ratr: the structure of the argument must be expanded (it occurs only with modifiers or complements)

    • batr: the argument must be modified by a bound possessive pronoun (własny/swój)

  • preplexnp(preposition,case,number,lemma,modification): idiomatic arguments embedded within prepositional phrase

  • fixed(string): fixed expressions

  • or: oratio recta, i.e. direct speech

  • refl: reflexive use marked through the word się

  • E: implicit subject, transmitted subject when marked as controller

Sources

  • ŚWIDZIŃSKI, M. (1992). Realizacje zdaniowe podmiotu-mianownika, czyli o strukturalnych ograniczeniach selekcyjnych, in: A. Markowski (ed.), Opisać słowa, pp. 188–201, Dom Wydawniczy Elipsa, Warsaw.

  • ŚWIDZIŃSKI, M. (1994). Syntactic Dictionary of Polish Verbs, Uniwersytet Warszawski / Universiteit van Amsterdam.

  • Expanded electronic version of the Syntactic Dictionary of Polish Verbs, distributed with the Świgra parser (Słownik walencyjny analizatora), available at Składnica page.

Authors

The format of the dictionary has been devised by:

  • Filip Skwarski <Filip DOT Skwarski AT SPAMFREE ipipan DOT waw DOT pl>

  • Marek Świdziński
  • Witold Kieraś
  • Elżbieta Hajnicz <hajnicz AT SPAMFREE ipipan DOT waw DOT pl>

  • Agnieszka Patejuk <aep AT SPAMFREE ipipan DOT waw DOT pl>

  • Adam Przepiórkowski <adamp AT SPAMFREE ipipan DOT waw DOT pl>

  • Marcin Woliński <wolinski AT SPAMFREE ipipan DOT waw DOT pl>

Manual edition has been carried out by:

  • Filip Skwarski
  • Sebastian Żurowski
  • Jakub Szymczak
  • Piotr Batko
  • Joanna Filipczak
  • Marta Kalużna
  • Marcin Opacki
  • Paulina Rosalska
  • Maciej Zgondek

License

The data are available under a CC BY-SA license.

Available resources

The following text versions of the dictionary is available:

The complete dictionary database, including corpus examples illustrating the use of individual frames, may also be viewed through the online application Slowal, which was used during the creation of the dictionary. Instructions for obtaining access to a guest account on the referenced page.

Errors

Please report any errors in the dictionary by sending an e-mail to: "val DYWIZ err MAŁPKA chopin KROPKA ipipan KROPKA waw KROPKA pl" (omit spaces; DYWIZ = "-", MAŁPKA = "@", KROPKA = "."; so the real address is an expanded version of v-e@c.i.w.p; if you are a spambot and you've parsed that – chapeau bas).