Quantification and object perception in Multimodal Large Language Models and human linguistic cognition
Raquel Montero, Natalia Moskvina, Paolo Morosi, Tamara Serrano, Elena Pagliarini, Evelina Leivada

TL;DR
This study investigates how Multimodal Large Language Models encode human-like quantification features, revealing high numerosity estimation accuracy but notable differences from humans in quantifier use and prototypicality across languages.
Contribution
It explores three key features of human quantification in MLLMs, comparing model architectures and languages to understand their semantic and pragmatic capabilities.
Findings
High accuracy in numerosity estimation by thinking models
Models differ from humans in quantifier ranges and prototypicality
Cross-linguistic analysis shows robustness and variability
Abstract
Quantification has been proven to be a particularly difficult linguistic phenomenon for (Multimodal) Large Language Models (MLLMs). However, given that quantification interfaces with the logic, pragmatic, and numerical domains, the exact reasons for the poor performance are still unclear. This paper looks at three key features of human quantification shared cross-linguistically that have remained so far unexplored in the (M)LLM literature: the ordering of quantifiers into scales, the ranges of use and prototypicality, and the biases inherent in the human approximate number system. The aim is to determine how these features are encoded in the models' architecture, how they may differ from humans, and whether the results are affected by the type of model (thinking vs. instruct) and the language under investigation. Results show that although thinking models showed a high accuracy in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeurobiology of Language and Bilingualism · Topic Modeling · Multimodal Machine Learning Applications
