SWE-Bench-CL: Continual Learning for Coding Agents

Thomas Joshi; Shayan Chowdhury; Fatih Uysal

arXiv:2507.00014·cs.LG·July 2, 2025

SWE-Bench-CL: Continual Learning for Coding Agents

Thomas Joshi, Shayan Chowdhury, Fatih Uysal

PDF

Open Access 1 Repo

TL;DR

This paper introduces SWE-Bench-CL, a comprehensive continual learning benchmark for coding agents that evaluates their ability to learn from evolving software development tasks while resisting forgetting.

Contribution

It presents a new benchmark with a dataset, evaluation framework, and metrics specifically designed for continual learning in software engineering tasks.

Findings

01

Memory-enabled agents outperform memory-disabled ones.

02

The benchmark reveals significant challenges in knowledge transfer and retention.

03

The framework facilitates reproducible evaluation of adaptive coding agents.

Abstract

Large Language Models (LLMs) have achieved impressive results on static code-generation benchmarks, but real-world software development unfolds as a continuous stream of evolving issues, fixes, and feature requests. We introduce SWE-Bench-CL, a novel continual learning benchmark built on the human-verified SWE-Bench Verified dataset introduced by OpenAI and Princeton-NLP in 2024. By organizing GitHub issues into chronologically ordered sequences that reflect natural repository evolution, SWE-Bench-CL enables direct evaluation of an agent's ability to accumulate experience, transfer knowledge across tasks, and resist catastrophic forgetting. We complement the dataset with (i) a preliminary analysis of inter-task structural similarity and contextual sensitivity, (ii) an interactive LangGraph-based evaluation framework augmented with a FAISS-backed semantic memory module, and (iii) a suite…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thomasjoshi/agents-never-forget
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Software Engineering Research · Artificial Intelligence in Healthcare and Education