Compressing Code Context for LLM-based Issue Resolution

Haoxiang Jia; Earl T. Barr; Sergey Mechtaev

arXiv:2603.28119·cs.SE·March 31, 2026

Compressing Code Context for LLM-based Issue Resolution

Haoxiang Jia, Earl T. Barr, Sergey Mechtaev

PDF

TL;DR

This paper introduces a novel framework for compressing code context in LLM-based issue resolution, significantly reducing input size while maintaining or improving bug-fixing effectiveness.

Contribution

It proposes Oracle-guided Code Distillation and a lightweight compression model, SWEzze, to efficiently distill and compress code contexts for better LLM performance.

Findings

01

Swezze maintains about 6x compression across models.

02

Reduces token budget by up to 71.3%.

03

Improves issue resolution rates by up to 9.2%.

Abstract

Large Language Models (LLMs) are now capable of resolving real-world GitHub issues. However, current approaches overapproximate the code context and suffer from two compounding problems: the prohibitive cost of processing massive inputs, and low effectiveness as noise floods the context window and distracts the model from the bug-fixing signal. Existing compression techniques fail to resolve this tension: generic compressors compromise the semantic integrity of code, while code-specific tools lack awareness of code structure and task context to preserve essential patch ingredients. To address this, we propose a novel framework consisting of two components. First, Oracle-guided Code Distillation (OCD), a context distillation algorithm that combines genetic search and delta debugging to systematically reduce code contexts to their minimal sufficient subsequence - retaining only the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.