Core Knowledge Deficits in Multi-Modal Language Models

Yijiang Li; Qingying Gao; Tianwei Zhao; Bingyang Wang; Haoran Sun; Haiyun Lyu; Robert D. Hawkins; Nuno Vasconcelos; Tal Golan; Dezhi Luo; Hokin Deng

arXiv:2410.10855·cs.CL·June 23, 2025

Core Knowledge Deficits in Multi-Modal Language Models

Yijiang Li, Qingying Gao, Tianwei Zhao, Bingyang Wang, Haoran Sun, Haiyun Lyu, Robert D. Hawkins, Nuno Vasconcelos, Tal Golan, Dezhi Luo, Hokin Deng

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

This paper investigates the core knowledge deficits in multi-modal large language models, introducing a benchmark and evaluation methods to reveal their limitations in fundamental cognitive abilities.

Contribution

The paper introduces CoreCognition, a comprehensive benchmark for core knowledge, and proposes Concept Hacking, a new evaluation method to analyze MLLMs' understanding of fundamental concepts.

Findings

01

MLLMs underperform on low-level abilities compared to high-level ones

02

MLLMs show limited scalability in core knowledge tasks

03

MLLMs rely on shortcut learning rather than genuine understanding

Abstract

While Multi-modal Large Language Models (MLLMs) demonstrate impressive abilities over high-level perception and reasoning, their robustness in the wild remains limited, often falling short on tasks that are intuitive and effortless for humans. We examine the hypothesis that these deficiencies stem from the absence of core knowledge--rudimentary cognitive abilities innate to humans from early childhood. To explore the core knowledge representation in MLLMs, we introduce CoreCognition, a large-scale benchmark encompassing 12 core knowledge concepts grounded in developmental cognitive science. We evaluate 230 models with 11 different prompts, leading to a total of 2,530 data points for analysis. Our experiments uncover four key findings, collectively demonstrating core knowledge deficits in MLLMs: they consistently underperform and show reduced, or even absent, scalability on low-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

siquanhuang/Multi-metrics_against_backdoors_in_FL
pytorch

Datasets

williamium/CoreCognition
dataset· 259 dl
259 dl

Videos

Core Knowledge Deficits in Multi-Modal Language Models· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques