PatchRecall: Patch-Driven Retrieval for Automated Program Repair
Mahir Labib Dihan, Faria Binta Awal, Md. Ishrak Ahsan

TL;DR
PatchRecall is a hybrid retrieval method that improves the selection of relevant files for automated program repair by combining codebase and history-based strategies, achieving higher recall with fewer files.
Contribution
It introduces a novel hybrid retrieval approach that balances recall and conciseness, enhancing automated program repair effectiveness.
Findings
PatchRecall achieves higher recall without increasing the number of retrieved files.
The method improves the effectiveness of automated program repair on SWE-Bench.
Combining codebase and history-based retrieval strategies outperforms individual approaches.
Abstract
Retrieving the correct set of files from a large codebase is a crucial step in Automated Program Repair (APR). High recall is necessary to ensure that the relevant files are included, but simply increasing the number of retrieved files introduces noise and degrades efficiency. To address this tradeoff, we propose PatchRecall, a hybrid retrieval approach that balances recall with conciseness. Our method combines two complementary strategies: (1) codebase retrieval, where the current issue description is matched against the codebase to surface potentially relevant files, and (2) history-based retrieval, where similar past issues are leveraged to identify edited files as candidate targets. Candidate files from both strategies are merged and reranked to produce the final retrieval set. Experiments on SWE-Bench demonstrate that PatchRecall achieves higher recall without significantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
