Compressing Code Context for LLM-based Issue Resolution
Haoxiang Jia, Earl T. Barr, Sergey Mechtaev

TL;DR
This paper introduces a novel framework for compressing code context in LLM-based issue resolution, significantly reducing input size while maintaining or improving bug-fixing effectiveness.
Contribution
It proposes Oracle-guided Code Distillation and a lightweight compression model, SWEzze, to efficiently distill and compress code contexts for better LLM performance.
Findings
Swezze maintains about 6x compression across models.
Reduces token budget by up to 71.3%.
Improves issue resolution rates by up to 9.2%.
Abstract
Large Language Models (LLMs) are now capable of resolving real-world GitHub issues. However, current approaches overapproximate the code context and suffer from two compounding problems: the prohibitive cost of processing massive inputs, and low effectiveness as noise floods the context window and distracts the model from the bug-fixing signal. Existing compression techniques fail to resolve this tension: generic compressors compromise the semantic integrity of code, while code-specific tools lack awareness of code structure and task context to preserve essential patch ingredients. To address this, we propose a novel framework consisting of two components. First, Oracle-guided Code Distillation (OCD), a context distillation algorithm that combines genetic search and delta debugging to systematically reduce code contexts to their minimal sufficient subsequence - retaining only the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
