Context Pruning for Coding Agents via Multi-Rubric Latent Reasoning
Jingjing Wang, Xiwen Chen, Wenhui Zhu, Huayu Li, Zhengxiao He, Feiyang Cai, Ana S. Carreon-Rascon, Xuanzhao Dong, Feng Luo

TL;DR
This paper introduces LaMR, a structured pruning framework that decomposes code relevance into semantic and dependency dimensions, improving token efficiency and accuracy in coding agents by filtering irrelevant code segments.
Contribution
LaMR is a novel multi-rubric structured pruning method that models heterogeneous relevance patterns with dedicated CRFs and uses AST-based analysis for supervision, outperforming unpruned baselines.
Findings
LaMR matches or outperforms full-context baselines on four benchmarks.
It saves up to 31% tokens in multi-turn tasks.
It improves Exact Match scores by up to +3.5 on single-turn tasks.
Abstract
LLM-powered coding agents spend the majority of their token budget reading repository files, yet much of the retrieved code is irrelevant to the task at hand. Existing learned pruners compress this context with a single-objective sequence labeler, collapsing all facets of code relevance into one score and one transition matrix. We show that this formulation creates a modeling bottleneck: a single CRF transition prior must serve heterogeneous retention patterns, including contiguous semantic spans and sparse structural support lines. We propose LaMR (Latent Multi-Rubric), a structured pruning framework that decomposes code relevance into two interpretable quality dimensions, semantic evidence and dependency support, each modeled by a dedicated CRF with dimension-specific transition dynamics. A mixture-of-experts gating network dynamically weights the per-rubric emissions conditioned on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
