SynConfRoute: Syntax-Aware Routing for Efficient Code Completion with Small CodeLLMs
Kishanthan Thangarajah, Boyuan Chen, Ahmed E. Hassan

TL;DR
SynConfRoute is a training-free, syntax-aware routing method that enhances small code LLMs' efficiency and accuracy by selectively escalating difficult tasks to larger models, reducing resource use while maintaining high-quality code completion.
Contribution
It introduces SynConfRoute, a novel syntax-aware routing approach that improves code completion performance and efficiency without additional training, applicable across multiple programming languages.
Findings
Model family and training matter more than size for code LLMs.
SynConfRoute improves pass@1 by up to 31% on multi-language tasks.
The pipeline reduces accelerator usage by 58% while maintaining high accuracy.
Abstract
Enterprises want AI code completion that is both high-quality and private, but they face a tension: proprietary models yield better results yet risk exposing proprietary code, while self-hosting large models is expensive and hard to maintain. As a lighter alternative, small CodeLLMs (1B-3B) can run on a developer's workstation accelerator with code never leaving the machine, but they fail on harder tasks. A practical solution is to use the small model for most requests and selectively route difficult ones to a larger self-hosted model. In this study, we evaluate 29 code specialized LLMs (0.5B-480B) from 12 families on execution-based fill-in-the-middle (FIM) code completion benchmarks across Python, Java, and C++, and find that model family and code specialized training matter more than size: a 3B model matches a 32B model despite being 10x smaller. Analyzing the 3B model's failures, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
