VCU-Bridge: Hierarchical Visual Connotation Understanding via Semantic Bridging

Ming Zhong; Yuanlei Wang; Liuzhou Zhang; Arctanx An; Renrui Zhang; Hao Liang; Ming Lu; Ying Shen; Wentao Zhang

arXiv:2511.18121·cs.CV·November 25, 2025

VCU-Bridge: Hierarchical Visual Connotation Understanding via Semantic Bridging

Ming Zhong, Yuanlei Wang, Liuzhou Zhang, Arctanx An, Renrui Zhang, Hao Liang, Ming Lu, Ying Shen, Wentao Zhang

PDF

Open Access 1 Datasets

TL;DR

This paper introduces VCU-Bridge, a hierarchical framework for visual connotation understanding that mimics human reasoning, along with a benchmark and data generation pipeline, to improve multimodal model performance across reasoning levels.

Contribution

It proposes a novel hierarchical reasoning framework and benchmark for visual connotation understanding, and develops a data generation pipeline to enhance model capabilities.

Findings

01

Performance declines at higher reasoning levels.

02

Strengthening low-level perception improves high-level reasoning.

03

Method improves results on HVCU-Bench and general benchmarks.

Abstract

While Multimodal Large Language Models (MLLMs) excel on benchmarks, their processing paradigm differs from the human ability to integrate visual information. Unlike humans who naturally bridge details and high-level concepts, models tend to treat these elements in isolation. Prevailing evaluation protocols often decouple low-level perception from high-level reasoning, overlooking their semantic and causal dependencies, which yields non-diagnostic results and obscures performance bottlenecks. We present VCU-Bridge, a framework that operationalizes a human-like hierarchy of visual connotation understanding: multi-level reasoning that advances from foundational perception through semantic bridging to abstract connotation, with an explicit evidence-to-inference trace from concrete cues to abstract conclusions. Building on this framework, we construct HVCU-Bench, a benchmark for hierarchical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Chime316/HVCU-Bench
dataset· 59 dl
59 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI)