SWE-Skills-Bench: Do Agent Skills Actually Help in Real-World Software Engineering?
Tingxu Han, Yi Zhang, Wei Song, Chunrong Fang, Zhenyu Chen, Youcheng Sun, Lijie Hu

TL;DR
This paper evaluates the real-world utility of agent skills in software engineering tasks using a new benchmark, revealing limited benefits and highlighting the importance of domain fit and context compatibility.
Contribution
Introduces SWE-Skills-Bench, a requirement-driven benchmark for isolating and evaluating the impact of agent skills in authentic software engineering scenarios.
Findings
Most skills show no improvement in pass rates.
Average gain in success rate is only +1.2%.
Some skills can even degrade performance.
Abstract
Agent skills, structured procedural knowledge packages injected at inference time, are increasingly used to augment LLM agents on software engineering tasks. However, their real utility in end-to-end development settings remains unclear. We present SWE-Skills-Bench, the first requirement-driven benchmark that isolates the marginal utility of agent skills in real-world software engineering (SWE). It pairs 49 public SWE skills with authentic GitHub repositories pinned at fixed commits and requirement documents with explicit acceptance criteria, yielding approximately 565 task instances across six SWE subdomains. We introduce a deterministic verification framework that maps each task's acceptance criteria to execution-based tests, enabling controlled paired evaluation with and without the skill. Our results show that skill injection benefits are far more limited than rapid adoption…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Advanced Software Engineering Methodologies
