RepoShapley: Shapley-Enhanced Context Filtering for Repository-Level Code Completion
Yu Huo, Kun Zeng, Siyu Zhang, Yuquan Lu, Cheng Yang, Yifu Guo, Xiaoying Tang

TL;DR
RepoShapley enhances repository-level code completion by using Shapley values to filter context, improving quality and reducing harmful information in retrieval-augmented generation.
Contribution
It introduces a coalition-aware filtering framework with an offline labeling module that estimates chunk utility using Shapley values for better context selection.
Findings
Improves code completion quality across benchmarks.
Reduces harmful context and unnecessary retrieval.
Uses Shapley values for effective context filtering.
Abstract
Repository-level code completion benefits from retrieval-augmented generation (RAG). However, controlling cross-file evidence is difficult because chunk utility is often interaction-dependent: some snippets help only when paired with complementary context, while others harm decoding when they conflict. We propose RepoShapley, a coalition-aware context filtering framework supervised by Shapley-style marginal contributions. Our offline labeling module, ChunkShapley, estimates signed per-chunk effects via teacher-forced probing, feeds them into a lightweight surrogate game that captures saturation and interference, computes exact Shapley values for small retrieval sets, and selects a decoding-optimal coalition through bounded post-verification with the frozen generator. The verified keep/drop decisions and retrieval triggers are then distilled into a single model via discrete control…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
