Why LLMs Aren't Scientists Yet: Lessons from Four Autonomous Research Attempts

Dhruv Trehan; Paras Chopra

arXiv:2601.03315·cs.LG·January 8, 2026

Why LLMs Aren't Scientists Yet: Lessons from Four Autonomous Research Attempts

Dhruv Trehan, Paras Chopra

PDF

Open Access

TL;DR

This paper examines four attempts to create autonomous ML research systems using LLMs, highlighting common failure modes and proposing design principles for future robust AI-scientist systems.

Contribution

It provides a detailed case study of autonomous research attempts, identifies key failure modes, and offers design principles to improve AI-driven scientific discovery.

Findings

01

Three attempts failed during implementation or evaluation.

02

One successful attempt was accepted to a scientific venue with AI as first author.

03

Identified six recurring failure modes in autonomous research systems.

Abstract

We report a case study of four end-to-end attempts to autonomously generate ML research papers using a pipeline of six LLM agents mapped to stages of the scientific workflow. Of these four, three attempts failed during implementation or evaluation. One completed the pipeline and was accepted to Agents4Science 2025, an experimental inaugural venue that required AI systems as first authors, passing both human and multi-AI review. From these attempts, we document six recurring failure modes: bias toward training data defaults, implementation drift under execution pressure, memory and context degradation across long-horizon tasks, overexcitement that declares success despite obvious failures, insufficient domain intelligence, and weak scientific taste in experimental design. We conclude by discussing four design principles for more robust AI-scientist systems, implications for autonomous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Ethics and Social Impacts of AI · Artificial Intelligence in Law