The Labyrinth of Links: Navigating the Associative Maze of Multi-modal   LLMs

Hong Li; Nanxi Li; Yuanjie Chen; Jianbin Zhu; Qinlu Guo; Cewu Lu,; Yong-Lu Li

arXiv:2410.01417·cs.CV·March 4, 2025

The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs

Hong Li, Nanxi Li, Yuanjie Chen, Jianbin Zhu, Qinlu Guo, Cewu Lu,, Yong-Lu Li

PDF

Open Access 1 Video

TL;DR

This paper introduces a new benchmark to evaluate multi-modal large language models' ability to form associations between observations and prior knowledge, revealing current models' significant limitations compared to humans.

Contribution

It proposes an annotation-free, systematically refined benchmark for association tasks in MLLMs, covering multiple levels and dimensions of association capabilities.

Findings

01

Current open-source MLLMs perform poorly on association tasks.

02

Even GPT-4V(vision) shows a significant gap compared to human performance.

03

The benchmark facilitates future research to improve MLLM associative abilities.

Abstract

Multi-modal Large Language Models (MLLMs) have exhibited impressive capability. However, recently many deficiencies of MLLMs have been found compared to human intelligence, $e.g.$ , hallucination. To drive the MLLMs study, the community dedicated efforts to building larger benchmarks with complex tasks. In this paper, we propose benchmarking an essential but usually overlooked intelligence: $association$ , a human's basic capability to link observation and prior practice memory. To comprehensively investigate MLLM's performance on the association, we formulate the association task and devise a standard benchmark based on adjective and verb semantic concepts. Instead of costly data annotation and curation, we propose a convenient $annotation-free$ construction method transforming the general dataset for our association tasks. Simultaneously, we devise a rigorous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · linguistics and terminology studies · Translation Studies and Practices