Quantification and object perception in Multimodal Large Language Models and human linguistic cognition

Raquel Montero; Natalia Moskvina; Paolo Morosi; Tamara Serrano; Elena Pagliarini; Evelina Leivada

arXiv:2511.08126·cs.CL·March 26, 2026

Quantification and object perception in Multimodal Large Language Models and human linguistic cognition

Raquel Montero, Natalia Moskvina, Paolo Morosi, Tamara Serrano, Elena Pagliarini, Evelina Leivada

PDF

Open Access

TL;DR

This study investigates how Multimodal Large Language Models encode human-like quantification features, revealing high numerosity estimation accuracy but notable differences from humans in quantifier use and prototypicality across languages.

Contribution

It explores three key features of human quantification in MLLMs, comparing model architectures and languages to understand their semantic and pragmatic capabilities.

Findings

01

High accuracy in numerosity estimation by thinking models

02

Models differ from humans in quantifier ranges and prototypicality

03

Cross-linguistic analysis shows robustness and variability

Abstract

Quantification has been proven to be a particularly difficult linguistic phenomenon for (Multimodal) Large Language Models (MLLMs). However, given that quantification interfaces with the logic, pragmatic, and numerical domains, the exact reasons for the poor performance are still unclear. This paper looks at three key features of human quantification shared cross-linguistically that have remained so far unexplored in the (M)LLM literature: the ordering of quantifiers into scales, the ranges of use and prototypicality, and the biases inherent in the human approximate number system. The aim is to determine how these features are encoded in the models' architecture, how they may differ from humans, and whether the results are affected by the type of model (thinking vs. instruct) and the language under investigation. Results show that although thinking models showed a high accuracy in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeurobiology of Language and Bilingualism · Topic Modeling · Multimodal Machine Learning Applications