Revision or Re-Solving? Decomposing Second-Pass Gains in Multi-LLM Pipelines
Jingjie Ning, Xueqi Li, Chengyu Yu

TL;DR
This paper investigates the actual sources of gains in multi-LLM revision pipelines, revealing that improvements depend on task type, draft quality, and information structure, challenging the common assumption of error correction as the main benefit.
Contribution
It introduces a controlled decomposition method to distinguish between re-solving, scaffolding, and content contributions in multi-LLM pipelines across different tasks.
Findings
On MCQ tasks, re-solving by stronger models explains most gains.
For code tasks, structural scaffolding from drafts is beneficial even if content is weak.
Strong drafts improve weak reviewers, highlighting the importance of draft quality.
Abstract
Multi-LLM revision pipelines, in which a second model reviews and improves a draft produced by a first, are widely assumed to derive their gains from genuine error correction. We question this assumption with a controlled decomposition experiment that uses four matched conditions to separate second-pass gains into three additive components: re-solving, scaffold, and content. We evaluate this design across two model pairs on three benchmarks spanning knowledge-intensive MCQ and competitive programming. Our results show that the gains of multi-LLM revision are not monolithic, but depend on task structure, draft quality, and the type of draft information. On MCQ tasks, where the answer space is constrained and drafts provide little structural guidance, most gains are consistent with stronger-model re-solving, and directly routing queries to the stronger model can be more effective than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
