AMVICC: A Novel Benchmark for Cross-Modal Failure Mode Profiling for VLMs and IGMs
Aahana Basappa, Pranay Goel, Anusri Karra, Anish Karra, Asa Gilmore, Kevin Zhu

TL;DR
This paper introduces AMVICC, a new benchmark for systematically profiling failure modes in vision-language models across image-to-text and text-to-image tasks, revealing shared and modality-specific limitations.
Contribution
The paper presents AMVICC, a novel benchmark that enables cross-modal failure mode analysis of VLMs and IGMs, highlighting their common and distinct weaknesses in visual reasoning.
Findings
Models share many failure modes across modalities.
IGMs struggle with manipulating visual components.
Failures are often model-specific and modality-specific.
Abstract
We investigated visual reasoning limitations of both multimodal large language models (MLLMs) and image generation models (IGMs) by creating a novel benchmark to systematically compare failure modes across image-to-text and text-to-image tasks, enabling cross-modal evaluation of visual understanding. Despite rapid growth in machine learning, vision language models (VLMs) still fail to understand or generate basic visual concepts such as object orientation, quantity, or spatial relationships, which highlighted gaps in elementary visual reasoning. By adapting MMVP benchmark questions into explicit and implicit prompts, we create \textit{AMVICC}, a novel benchmark for profiling failure modes across various modalities. After testing 11 MLLMs and 3 IGMs in nine categories of visual reasoning, our results show that failure modes are often shared between models and modalities, but certain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)
