Token Sugar: Making Source Code Sweeter for LLMs through Token-Efficient Shorthand

Zhensu Sun; Chengran Yang; Xiaoning Du; Zhou Yang; Li Li; David Lo

arXiv:2512.08266·cs.SE·December 10, 2025

Token Sugar: Making Source Code Sweeter for LLMs through Token-Efficient Shorthand

Zhensu Sun, Chengran Yang, Xiaoning Du, Zhou Yang, Li Li, David Lo

PDF

Open Access

TL;DR

Token Sugar introduces a method to replace common verbose code patterns with reversible shorthands, significantly reducing token counts in source code and during LLM generation without sacrificing performance.

Contribution

The paper presents a systematic approach to mine high-frequency code patterns and create reversible shorthands, enhancing token efficiency at the semantic level for LLM training.

Findings

01

Up to 15.1% token reduction in source code

02

Up to 11.2% token savings during generation

03

Maintains near-identical Pass@1 scores

Abstract

Large language models (LLMs) have shown exceptional performance in code generation and understanding tasks, yet their high computational costs hinder broader adoption. One important factor is the inherent verbosity of programming languages, such as unnecessary formatting elements and lengthy boilerplate code. This leads to inflated token counts in both input and generated outputs, which increases inference costs and slows down the generation process. Prior work improves this through simplifying programming language grammar, reducing token usage across both code understanding and generation tasks. However, it is confined to syntactic transformations, leaving significant opportunities for token reduction unrealized at the semantic level. In this work, we propose Token Sugar, a concept that replaces frequent and verbose code patterns with reversible, token-efficient shorthand in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Topic Modeling · Machine Learning in Materials Science