The Perplexity Paradox: Why Code Compresses Better Than Math in LLM Prompts
Warren Johnson

TL;DR
This paper investigates the perplexity paradox in large language models, demonstrating how code and math prompts behave differently under compression, and introduces an adaptive compression method that reduces costs while maintaining quality.
Contribution
It validates the perplexity paradox across multiple benchmarks, analyzes token-level perplexity differences, and proposes the TAAC adaptive compression algorithm.
Findings
Code syntax tokens have high perplexity and are preserved during compression.
Numerical values in math problems have low perplexity and are often pruned.
TAAC achieves significant cost reduction with high quality preservation.
Abstract
In "Compress or Route?" (Johnson, 2026), we found that code generation tolerates aggressive prompt compression (r >= 0.6) while chain-of-thought reasoning degrades gradually. That study was limited to HumanEval (164 problems), left the "perplexity paradox" mechanism unvalidated, and provided no adaptive algorithm. This paper addresses all three gaps. First, we validate across six code benchmarks (HumanEval, MBPP, HumanEval+, MultiPL-E) and four reasoning benchmarks (GSM8K, MATH, ARC-Challenge, MMLU-STEM), confirming the compression threshold generalizes across languages and difficulties. Second, we conduct the first per-token perplexity analysis (n=723 tokens), revealing a "perplexity paradox": code syntax tokens are preserved (high perplexity) while numerical values in math problems are pruned despite being task-critical (low perplexity). Signature injection recovers +34 percentage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLogic, programming, and type systems · Parallel Computing and Optimization Techniques · Teaching and Learning Programming
