SynConfRoute: Syntax-Aware Routing for Efficient Code Completion with Small CodeLLMs

Kishanthan Thangarajah; Boyuan Chen; Ahmed E. Hassan

arXiv:2605.04894·cs.SE·May 7, 2026

SynConfRoute: Syntax-Aware Routing for Efficient Code Completion with Small CodeLLMs

Kishanthan Thangarajah, Boyuan Chen, Ahmed E. Hassan

PDF

TL;DR

SynConfRoute is a training-free, syntax-aware routing method that enhances small code LLMs' efficiency and accuracy by selectively escalating difficult tasks to larger models, reducing resource use while maintaining high-quality code completion.

Contribution

It introduces SynConfRoute, a novel syntax-aware routing approach that improves code completion performance and efficiency without additional training, applicable across multiple programming languages.

Findings

01

Model family and training matter more than size for code LLMs.

02

SynConfRoute improves pass@1 by up to 31% on multi-language tasks.

03

The pipeline reduces accelerator usage by 58% while maintaining high accuracy.

Abstract

Enterprises want AI code completion that is both high-quality and private, but they face a tension: proprietary models yield better results yet risk exposing proprietary code, while self-hosting large models is expensive and hard to maintain. As a lighter alternative, small CodeLLMs (1B-3B) can run on a developer's workstation accelerator with code never leaving the machine, but they fail on harder tasks. A practical solution is to use the small model for most requests and selectively route difficult ones to a larger self-hosted model. In this study, we evaluate 29 code specialized LLMs (0.5B-480B) from 12 families on execution-based fill-in-the-middle (FIM) code completion benchmarks across Python, Java, and C++, and find that model family and code specialized training matter more than size: a 3B model matches a 32B model despite being 10x smaller. Analyzing the 3B model's failures, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.