Collaborative Evolving Strategy for Automatic Data-Centric Development
Xu Yang, Haotian Chen, Wenjun Feng, Haoxue Wang, Zeqi Ye, Xinjie Shen,, Xiao Yang, Shizhao Sun, Weiqing Liu, Jiang Bian

TL;DR
This paper introduces AD^2, an autonomous data-centric AI development framework leveraging LLMs, with a novel collaborative evolution strategy that improves data scheduling and implementation through practical feedback.
Contribution
It pioneers the automatic data-centric development task and proposes Co-STEER, an LLM-based agent that collaboratively evolves scheduling and implementation skills.
Findings
Co-STEER significantly improves data development efficiency.
The collaborative evolution strategy enhances scheduling and implementation accuracy.
Experimental results validate the effectiveness of the proposed approach.
Abstract
Artificial Intelligence (AI) significantly influences many fields, largely thanks to the vast amounts of high-quality data for machine learning models. The emphasis is now on a data-centric AI strategy, prioritizing data development over model design progress. Automating this process is crucial. In this paper, we serve as the first work to introduce the automatic data-centric development (AD^2) task and outline its core challenges, which require domain-experts-like task scheduling and implementation capability, largely unexplored by previous work. By leveraging the strong complex problem-solving capabilities of large language models (LLMs), we propose an LLM-based autonomous agent, equipped with a strategy named Collaborative Knowledge-STudying-Enhanced Evolution by Retrieval (Co-STEER), to simultaneously address all the challenges. Specifically, our proposed Co-STEER agent enriches…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel-Driven Software Engineering Techniques · Software Engineering Research · Simulation Techniques and Applications
