#format wiki
#language en
#acl +All:read Default

= Quantifiers project =

== Project factsheet ==

|| English name:         || Quantifiers in Language: Use and Meaning ||
|| Polish name:          || Kwantyfikatory w języku: użycie i znaczenie ||
|| Project type:         || A [[http://www.ncn.gov.pl/|National Science Centre]] grant 2017/25/B/HS1/02911 ||
|| Duration:             || 19 April 2018 ‒ 17 April 2022 ||
## || Project Web page:     || (under preparation) ||
|| Principal investigator: || Jakub Szymanik ||

== Project summary ==

=== Research project objectives ===

Language is the primary means of human communication, and the main vehicle for scientific as well as common
sense reasoning. It has been investigated extensively from various theoretical perspectives, that of linguistics,
logic, and computer science. At the heart of this multi-faceted enterprise of semantics is a simple yet powerful
conception of the meaning of linguistic expressions. Namely, the meaning of a sentence is taken to lie in its
truth-conditions: one understands a sentence if one knows under which circumstances the sentence is true, and
under which it is false. This notion of meaning has been very fruitful resulting in a wealth of practical
applications, e.g., in computer science, including dialogue systems, automated reasoning, information retrieval
and search. The undeniable advantages of this theoretical endeavor invite a question: to what extent can it
account for the human linguistic behavior? The past decade has seen the increasing interaction between
cognitive science and linguistics. The new field of experimental semantics and pragmatics has been facing many
challenges. One of the main difficulties is of practical nature: experimental semantics is in serious need of
natural linguistic data. The recent advances in the field have focused on acquiring psycholinguistic data via lab
experimentation. Such data are crucial to understand language processing, however, there posit a problem from
the perspective of semantic interpretation. Namely, the subjects in such experiments do not need to follow their
everyday linguistic behavior; they can develop various cognitive strategies, for instance trading of speed and
linguistic accuracy. A psychologist would say that such data are not necessarily ecologically valid. Therefore, to
get a full picture of how people interpret language we need data coming from the real use of language. This
project will build a linguistic corpus annotated with semantic information that will offer a plethora of
information for language theory. In that way, we propose to partially solve the natural linguistic information
bottleneck problem that currently constraints the development of semantics and pragmatics.

=== Research project methodology ===

Building and analyzing Polish corpus annotated with semantics properties of quantifiers. Quantifiers are the
main research topic in semantics and pragmatics.

=== Expected impact of the research project on the development of science ===

The corpus and the toolkit for detecting and analyzing quantifiers will be freely avaiable online. Thus, the
project will solve the problem of the lack of natural language data that slows down the scientific progress of
language theory. Moreover, the project will provide a methodological basis for the development of further
natural language corpora annotated with other semantic and pragmatic phenomena. Based on the analysis of the
corpus we will also be able to better understand the semantic and cognitive factors responsible for the natural use
of language, developing the methodology proposed in (Szymanik and Thorne, 2017). In other words, the project
will provide data that, together with the existing theory of quantifiers (Szymanik, 2016), will create a unique
opportunity for a better understanding of the meaning.

=== Results ===

==== The corpus ====

 * [[http://kwantyfikatory.nlp.ipipan.waw.pl/|The Corpus of Quantificational Expressions]] indexed in MTAS search engine (beta version)

==== Papers ====

 * Woliński M., Nitoń B., Kieraś W., Szymanik J. (2022). [[attachment:lrec2022.pdf|HerBERT Based Language Model Detects Quantifiers and Their Semantic Properties in Polish]]. To appear in the Proceedings of the 13th Edition of Language Resources and Evaluation Conference (LREC 2022).
 * Szymanik J., Kieraś W. (2022). [[https://doi.org/10.1007/s10579-022-09578-4|The semantically annotated corpus of Polish quantificational expressions]]. Language Resources & Evaluation.

=== References ===

 * Carcassi F., Steinert-Threlkeld S., Szymanik J. (2021). [[https://doi.org/10.1111/cogs.13027|Monotone Quantifiers Emerge via Iterated Learning]]. Cognitive Science 45(8):e13027.
 * van de Pol I., Lodder P., van Maanen L., Steinert-Threlkeld S., Szymanik J. (2021). [[https://doi.org/10.31234/osf.io/xuhyr|Quantifiers Satisfying Semantic Universals Are Simpler]]. !PsyArXiv preprint.
 * Steinert-Threlkeld S., Szymanik J. (2020). [[https://doi.org/10.1016/j.cognition.2019.104076|Ease of learning explains semantic universals]]. Cognition 195:104076. 
 * Szymanik J., Thorne C. (2017). [[https://doi.org/10.1016/j.langsci.2017.01.006|Exploring the relation of semantic complexity and quantifier distribution in large corpora]]. Language Sciences 60, pp. 80–93.
 * Szymanik J. (2016). [[https://link.springer.com/book/10.1007/978-3-319-28749-2|Quantifiers and Cognition. Logical and Computational Perspectives]]. Studies in Linguistics and Philosophy 96. Springer.