SWE-Bench-CL: Continual Learning for Coding Agents
Thomas Joshi, Shayan Chowdhury, Fatih Uysal

TL;DR
This paper introduces SWE-Bench-CL, a comprehensive continual learning benchmark for coding agents that evaluates their ability to learn from evolving software development tasks while resisting forgetting.
Contribution
It presents a new benchmark with a dataset, evaluation framework, and metrics specifically designed for continual learning in software engineering tasks.
Findings
Memory-enabled agents outperform memory-disabled ones.
The benchmark reveals significant challenges in knowledge transfer and retention.
The framework facilitates reproducible evaluation of adaptive coding agents.
Abstract
Large Language Models (LLMs) have achieved impressive results on static code-generation benchmarks, but real-world software development unfolds as a continuous stream of evolving issues, fixes, and feature requests. We introduce SWE-Bench-CL, a novel continual learning benchmark built on the human-verified SWE-Bench Verified dataset introduced by OpenAI and Princeton-NLP in 2024. By organizing GitHub issues into chronologically ordered sequences that reflect natural repository evolution, SWE-Bench-CL enables direct evaluation of an agent's ability to accumulate experience, transfer knowledge across tasks, and resist catastrophic forgetting. We complement the dataset with (i) a preliminary analysis of inter-task structural similarity and contextual sensitivity, (ii) an interactive LangGraph-based evaluation framework augmented with a FAISS-backed semantic memory module, and (iii) a suite…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Software Engineering Research · Artificial Intelligence in Healthcare and Education
