When Less is Enough: Efficient Inference via Collaborative Reasoning
Yilei Chen, Sharut Gupta, Yannis Paschalidis, Ayush Sekhari, Aldo Pacchiano

TL;DR
DUET is a collaborative inference framework where a capable model provides reasoning signals to a lightweight model, reducing inference costs by up to 60% while maintaining strong reasoning performance.
Contribution
This work introduces DUET, a novel two-stage inference method that separates reasoning and prediction, significantly lowering inference costs without sacrificing accuracy.
Findings
DUET reduces inference tokens by up to 60% on reasoning benchmarks.
The length-penalized training encourages efficient information transfer.
DUET maintains strong reasoning performance with lower computational cost.
Abstract
In this work, we introduce DUET (Dual-model Efficient Two-stage inference), a collaborative inference framework in which a capable model and a lightweight model work together to solve a task. Relying on a single large model to perform end-to-end reasoning and prediction often incurs substantial inference cost. In contrast, DUET decomposes inference into two stages: the capable model produces a reasoning signal, and the lightweight model interprets this signal to generate the final answer, allowing reasoning-intensive computation to be handled by the capable model while non-reasoning-intensive components are delegated to the lightweight model without sacrificing task performance. To achieve this objective, we propose a length-penalized joint training objective that encourages the capable model to transmit only the information that is sufficient for the lightweight model to solve the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
