DeltaPrompts: Escaping the Zero-Delta Trap in Multimodal Distillation
Jaehun Jung, Hyunwoo Kim, Brandon Cui, Ximing Lu, David Acuna, Prithviraj Ammanabrolu, Yejin Choi

TL;DR
This paper identifies the zero-delta prompt issue in multimodal distillation, proposes a method to generate high-divergence prompts, and demonstrates significant performance improvements across various reasoning benchmarks.
Contribution
It introduces DeltaPrompts, a synthetic dataset of high-divergence prompts, and a staged synthesis pipeline to improve distillation effectiveness by targeting functional gaps.
Findings
DeltaPrompts improves reasoning performance by up to 15% across benchmarks.
High-divergence prompts are essential for effective distillation.
The staged synthesis pipeline effectively generates prompts that expose model failure modes.
Abstract
Distillation enables compact Vision-Language Models (VLMs) to obtain strong reasoning capabilities, yet the prompts driving this process are typically chosen via simple heuristics or aggregated from off-the-shelf datasets. We reveal a critical inefficiency in this approach: up to 69% of the prompts in standard chart / document reasoning datasets are effectively zero-delta, meaning the teacher and student already induce the exact same answer distribution. Training on these prompts provides minimal learning signal, causing student improvement to rapidly saturate regardless of data scale. To escape the zero-delta trap, we return to first principles: distillation fundamentally minimizes distributional divergence, and thus a prompt is valuable only if it exposes a functional capability gap between the teacher and student. We quantify this gap through answer divergence (),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
