Model Cascading for Code: A Cascaded Black-Box Multi-Model Framework for   Cost-Efficient Code Completion with Self-Testing

Boyuan Chen; Mingzhi Zhu; Brendan Dolan-Gavitt; Muhammad Shafique,; Siddharth Garg

arXiv:2405.15842·cs.SE·February 17, 2025

Model Cascading for Code: A Cascaded Black-Box Multi-Model Framework for Cost-Efficient Code Completion with Self-Testing

Boyuan Chen, Mingzhi Zhu, Brendan Dolan-Gavitt, Muhammad Shafique,, Siddharth Garg

PDF

Open Access

TL;DR

This paper introduces a cascaded black-box framework combining model cascading and self-testing to optimize the cost-accuracy tradeoff in LLM-based code completion, reducing costs significantly while maintaining or improving accuracy.

Contribution

It proposes a novel, inference-time, black-box framework that dynamically balances model size and self-testing to optimize cost and accuracy in code generation.

Findings

01

Reduced average costs by 26% across models and datasets.

02

Achieved up to 70% cost reduction in best cases.

03

Maintained or improved accuracy compared to single-model schemes.

Abstract

The rapid advancement of large language models (LLMs) has significantly improved code completion tasks, yet the trade-off between accuracy and computational cost remains a critical challenge. While using larger models and incorporating inference-time self-testing algorithms can significantly improve output accuracy, they incur substantial computational expenses at the same time. Furthermore, servers in real-world scenarios usually have a dynamic preference on the cost-accuracy tradeoff, depending on the budget, bandwidth, the concurrent user volume, and users' sensitivity to wrong answers. In this work, we introduce a novel framework combining model cascading and inference-time self-feedback algorithms to find multiple near-optimal self-testing options on the cost-accuracy tradeoff in LLM-based code generation. Our approach leverages self-generated tests to both enhance accuracy and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Model-Driven Software Engineering Techniques · Software Engineering Research

MethodsSparse Evolutionary Training