DeltaPrompts: Escaping the Zero-Delta Trap in Multimodal Distillation

Jaehun Jung; Hyunwoo Kim; Brandon Cui; Ximing Lu; David Acuna; Prithviraj Ammanabrolu; Yejin Choi

arXiv:2605.15532·cs.LG·May 20, 2026

DeltaPrompts: Escaping the Zero-Delta Trap in Multimodal Distillation

Jaehun Jung, Hyunwoo Kim, Brandon Cui, Ximing Lu, David Acuna, Prithviraj Ammanabrolu, Yejin Choi

PDF

TL;DR

This paper identifies the zero-delta prompt issue in multimodal distillation, proposes a method to generate high-divergence prompts, and demonstrates significant performance improvements across various reasoning benchmarks.

Contribution

It introduces DeltaPrompts, a synthetic dataset of high-divergence prompts, and a staged synthesis pipeline to improve distillation effectiveness by targeting functional gaps.

Findings

01

DeltaPrompts improves reasoning performance by up to 15% across benchmarks.

02

High-divergence prompts are essential for effective distillation.

03

The staged synthesis pipeline effectively generates prompts that expose model failure modes.

Abstract

Distillation enables compact Vision-Language Models (VLMs) to obtain strong reasoning capabilities, yet the prompts driving this process are typically chosen via simple heuristics or aggregated from off-the-shelf datasets. We reveal a critical inefficiency in this approach: up to 69% of the prompts in standard chart / document reasoning datasets are effectively zero-delta, meaning the teacher and student already induce the exact same answer distribution. Training on these prompts provides minimal learning signal, causing student improvement to rapidly saturate regardless of data scale. To escape the zero-delta trap, we return to first principles: distillation fundamentally minimizes distributional divergence, and thus a prompt is valuable only if it exposes a functional capability gap between the teacher and student. We quantify this gap through answer divergence ( $Δ$ ),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.