AI Scientist via Synthetic Task Scaling

Ziyang Cai; Harkirat Behl

arXiv:2603.17216·cs.AI·March 19, 2026

AI Scientist via Synthetic Task Scaling

Ziyang Cai, Harkirat Behl

PDF

Open Access

TL;DR

This paper introduces a synthetic environment generation pipeline for training AI agents in machine learning tasks, leading to improved performance on the MLGym benchmark by leveraging synthetic challenges and trajectories from a teacher model.

Contribution

The paper presents a novel pipeline for generating high-quality synthetic machine learning challenges, enabling effective training of AI agents and advancing automatic scientific discovery.

Findings

01

Synthetic tasks improve student model performance on MLGym

02

Student models trained with synthetic tasks show 9-12% AUP improvement

03

Pipeline verifies datasets against Huggingface API for quality assurance

Abstract

With the advent of AI agents, automatic scientific discovery has become a tenable goal. Many recent works scaffold agentic systems that can perform machine learning research, but don't offer a principled way to train such agents -- and current LLMs often generate plausible-looking but ineffective ideas. To make progress on training agents that can learn from doing, we provide a novel synthetic environment generation pipeline targeting machine learning agents. Our pipeline automatically synthesizes machine learning challenges compatible with the SWE-agent framework, covering topic sampling, dataset proposal, and code generation. The resulting synthetic tasks are 1) grounded in real machine learning datasets, because the proposed datasets are verified against the Huggingface API and are 2) verified for higher quality with a self-debugging loop. To validate the effectiveness of our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Topic Modeling · Multimodal Machine Learning Applications