VCU-Bridge: Hierarchical Visual Connotation Understanding via Semantic Bridging
Ming Zhong, Yuanlei Wang, Liuzhou Zhang, Arctanx An, Renrui Zhang, Hao Liang, Ming Lu, Ying Shen, Wentao Zhang

TL;DR
This paper introduces VCU-Bridge, a hierarchical framework for visual connotation understanding that mimics human reasoning, along with a benchmark and data generation pipeline, to improve multimodal model performance across reasoning levels.
Contribution
It proposes a novel hierarchical reasoning framework and benchmark for visual connotation understanding, and develops a data generation pipeline to enhance model capabilities.
Findings
Performance declines at higher reasoning levels.
Strengthening low-level perception improves high-level reasoning.
Method improves results on HVCU-Bench and general benchmarks.
Abstract
While Multimodal Large Language Models (MLLMs) excel on benchmarks, their processing paradigm differs from the human ability to integrate visual information. Unlike humans who naturally bridge details and high-level concepts, models tend to treat these elements in isolation. Prevailing evaluation protocols often decouple low-level perception from high-level reasoning, overlooking their semantic and causal dependencies, which yields non-diagnostic results and obscures performance bottlenecks. We present VCU-Bridge, a framework that operationalizes a human-like hierarchy of visual connotation understanding: multi-level reasoning that advances from foundational perception through semantic bridging to abstract connotation, with an explicit evidence-to-inference trace from concrete cues to abstract conclusions. Building on this framework, we construct HVCU-Bench, a benchmark for hierarchical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI)
