Multi-Agent Systems for Dataset Adaptation in Software Engineering: Capabilities, Limitations, and Future Directions
Jingyi Chen, Xiaoyan Guo, Songqiang Chen, Shing-Chi Cheung, Jiasi Shen

TL;DR
This study empirically evaluates the capabilities and limitations of state-of-the-art multi-agent LLM systems, like GPT-4.1 and Claude Sonnet 4, in automating dataset adaptation tasks in software engineering, highlighting performance gaps and improvement strategies.
Contribution
First empirical analysis of multi-agent LLM systems applied to dataset adaptation in software engineering, identifying key challenges and potential enhancements.
Findings
Systems can identify key files and generate partial adaptations.
Prompt interventions improve structural similarity from 7.25% to 67.14%.
Current systems rarely produce functionally correct implementations.
Abstract
Automating the adaptation of software engineering (SE) research artifacts across datasets is essential for scalability and reproducibility, yet it remains largely unstudied. Recent advances in large language model (LLM)-based multi-agent systems, such as GitHub Copilot's agent mode, promise to automate complex development workflows through coordinated reasoning, code generation, and tool interaction. This paper presents the first empirical study on how state-of-the-art multi-agent systems perform in dataset adaptation tasks. We evaluate Copilot, backed by GPT-4.1 and Claude Sonnet 4, on adapting SE research artifacts from benchmark repositories including ROCODE and LogHub2.0. Through a five-stage evaluation pipeline (file comprehension, code editing, command generation, validation, and final execution), we measure success rates, analyze failure patterns, and assess prompt-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Scientific Computing and Data Management · Software System Performance and Reliability
