Model Cascading for Code: A Cascaded Black-Box Multi-Model Framework for Cost-Efficient Code Completion with Self-Testing
Boyuan Chen, Mingzhi Zhu, Brendan Dolan-Gavitt, Muhammad Shafique,, Siddharth Garg

TL;DR
This paper introduces a cascaded black-box framework combining model cascading and self-testing to optimize the cost-accuracy tradeoff in LLM-based code completion, reducing costs significantly while maintaining or improving accuracy.
Contribution
It proposes a novel, inference-time, black-box framework that dynamically balances model size and self-testing to optimize cost and accuracy in code generation.
Findings
Reduced average costs by 26% across models and datasets.
Achieved up to 70% cost reduction in best cases.
Maintained or improved accuracy compared to single-model schemes.
Abstract
The rapid advancement of large language models (LLMs) has significantly improved code completion tasks, yet the trade-off between accuracy and computational cost remains a critical challenge. While using larger models and incorporating inference-time self-testing algorithms can significantly improve output accuracy, they incur substantial computational expenses at the same time. Furthermore, servers in real-world scenarios usually have a dynamic preference on the cost-accuracy tradeoff, depending on the budget, bandwidth, the concurrent user volume, and users' sensitivity to wrong answers. In this work, we introduce a novel framework combining model cascading and inference-time self-feedback algorithms to find multiple near-optimal self-testing options on the cost-accuracy tradeoff in LLM-based code generation. Our approach leverages self-generated tests to both enhance accuracy and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Model-Driven Software Engineering Techniques · Software Engineering Research
MethodsSparse Evolutionary Training
