Curie: Toward Rigorous and Automated Scientific Experimentation with AI   Agents

Patrick Tser Jern Kon; Jiachen Liu; Qiuyi Ding; Yiming Qiu; Zhenning; Yang; Yibo Huang; Jayanth Srinivasa; Myungjin Lee; Mosharaf Chowdhury; Ang; Chen

arXiv:2502.16069·cs.AI·February 27, 2025

Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents

Patrick Tser Jern Kon, Jiachen Liu, Qiuyi Ding, Yiming Qiu, Zhenning, Yang, Yibo Huang, Jayanth Srinivasa, Myungjin Lee, Mosharaf Chowdhury, Ang, Chen

PDF

Open Access 1 Repo

TL;DR

Curie is an AI framework that automates scientific experimentation with enhanced rigor, reliability, control, and interpretability, demonstrated by improved performance on a novel benchmark across multiple computer science domains.

Contribution

We introduce Curie, an AI agent framework that embeds rigor into scientific experiments through specialized modules, advancing automation and reliability in scientific research.

Findings

01

3.4× improvement over baseline in answering experimental questions

02

Effective integration of rigor modules enhances experiment reliability

03

Open-sourced implementation available for community use

Abstract

Scientific experimentation, a cornerstone of human progress, demands rigor in reliability, methodical control, and interpretability to yield meaningful results. Despite the growing capabilities of large language models (LLMs) in automating different aspects of the scientific process, automating rigorous experimentation remains a significant challenge. To address this gap, we propose Curie, an AI agent framework designed to embed rigor into the experimentation process through three key components: an intra-agent rigor module to enhance reliability, an inter-agent rigor module to maintain methodical control, and an experiment knowledge module to enhance interpretability. To evaluate Curie, we design a novel experimental benchmark composed of 46 questions across four computer science domains, derived from influential research papers, and widely adopted open-source projects. Compared to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

just-curieous/curie
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management