When Less is Enough: Efficient Inference via Collaborative Reasoning

Yilei Chen; Sharut Gupta; Yannis Paschalidis; Ayush Sekhari; Aldo Pacchiano

arXiv:2605.01111·cs.LG·May 5, 2026

When Less is Enough: Efficient Inference via Collaborative Reasoning

Yilei Chen, Sharut Gupta, Yannis Paschalidis, Ayush Sekhari, Aldo Pacchiano

PDF

TL;DR

DUET is a collaborative inference framework where a capable model provides reasoning signals to a lightweight model, reducing inference costs by up to 60% while maintaining strong reasoning performance.

Contribution

This work introduces DUET, a novel two-stage inference method that separates reasoning and prediction, significantly lowering inference costs without sacrificing accuracy.

Findings

01

DUET reduces inference tokens by up to 60% on reasoning benchmarks.

02

The length-penalized training encourages efficient information transfer.

03

DUET maintains strong reasoning performance with lower computational cost.

Abstract

In this work, we introduce DUET (Dual-model Efficient Two-stage inference), a collaborative inference framework in which a capable model and a lightweight model work together to solve a task. Relying on a single large model to perform end-to-end reasoning and prediction often incurs substantial inference cost. In contrast, DUET decomposes inference into two stages: the capable model produces a reasoning signal, and the lightweight model interprets this signal to generate the final answer, allowing reasoning-intensive computation to be handled by the capable model while non-reasoning-intensive components are delegated to the lightweight model without sacrificing task performance. To achieve this objective, we propose a length-penalized joint training objective that encourages the capable model to transmit only the information that is sufficient for the lightweight model to solve the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.