MMCL-Bench: Multimodal Context Learning from Visual Rules, Procedures, and Evidence

Yifan Chen; Fei Yin; Qingyan Bai; Zicheng Lin; Yujiu Yang

arXiv:2605.12703·cs.CV·May 14, 2026

MMCL-Bench: Multimodal Context Learning from Visual Rules, Procedures, and Evidence

Yifan Chen, Fei Yin, Qingyan Bai, Zicheng Lin, Yujiu Yang

PDF

TL;DR

MMCL-Bench is a new benchmark designed to evaluate multimodal context learning, challenging models to recover and reason over visual evidence across diverse tasks, revealing significant gaps in current system capabilities.

Contribution

This paper introduces MMCL-Bench, a comprehensive benchmark for multimodal context learning from visual data, highlighting current models' limitations and guiding future research.

Findings

01

Current models solve fewer than one-third of tasks under strict evaluation.

02

Failures occur across context anchoring, evidence extraction, reasoning, and response construction.

03

MMCL-Bench exposes critical bottlenecks in multimodal context learning capabilities.

Abstract

We introduce MMCL-Bench, a benchmark for multimodal context learning: learning task-local rules, procedures, and empirical patterns from visual or mixed-modality teaching context and applying them to new visual instances. Unlike text-only context learning or standard multimodal question answering, this setting requires models to recover and localize relevant evidence from images, screenshots, manuals, videos, and frame sequences before they can reason over the learned context. MMCL-Bench contains 102 tasks spanning three categories: rule system application, procedural task execution, and empirical discovery and induction. We evaluate frontier multimodal models with strict rubric-based scoring and find that current systems remain far from robust multimodal context learning, with even the strongest model solving fewer than one-third of tasks under strict evaluation. Diagnostic ablations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.