Can AI Scientist Agents Learn from Lab-in-the-Loop Feedback? Evidence from Iterative Perturbation Discovery

Gilles Wainrib; Barbara Bodinier; Haitem Dakhli; Josep Monserrat; Almudena Espin Perez; Sabrina Carpentier; Roberta Codato; John Klein

arXiv:2603.26177·cs.LG·March 30, 2026

Can AI Scientist Agents Learn from Lab-in-the-Loop Feedback? Evidence from Iterative Perturbation Discovery

Gilles Wainrib, Barbara Bodinier, Haitem Dakhli, Josep Monserrat, Almudena Espin Perez, Sabrina Carpentier, Roberta Codato, John Klein

PDF

TL;DR

This study demonstrates that large language models can genuinely learn from lab-in-the-loop feedback in scientific experiments, significantly improving discovery outcomes when models are sufficiently capable.

Contribution

The paper provides evidence that LLM-based agents can utilize experimental feedback effectively, showing that in-context learning depends on model capability and structured feedback.

Findings

01

Access to feedback increases discoveries by 53.4%.

02

Performance gain disappears with permuted feedback, confirming feedback-driven learning.

03

Upgrading models reduces hallucination rates and enhances feedback utilization.

Abstract

Recent work has questioned whether large language models (LLMs) can perform genuine in-context learning (ICL) for scientific experimental design, with prior studies suggesting that LLM-based agents exhibit no sensitivity to experimental feedback. We shed new light on this question by carrying out 800 independently replicated experiments on iterative perturbation discovery in Cell Painting high-content screening. We compare an LLM agent that iteratively updates its hypotheses using experimental feedback to a zero-shot baseline that relies solely on pretraining knowledge retrieval. Access to feedback yields a $+ 53.4%$ increase in discoveries per feature on average ( $p = 0.003$ ). To test whether this improvement arises from genuine feedback-driven learning rather than prompt-induced recall of pretraining knowledge, we introduce a random feedback control in which hit/miss labels are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.