Are Large Language Models Reliable AI Scientists? Assessing Reverse-Engineering of Black-Box Systems
Jiayi Geng, Howard Chen, Dilip Arumugam, Thomas L. Griffiths

TL;DR
This paper evaluates the ability of large language models to reverse-engineer black-box systems, showing that active intervention improves understanding and offers insights into overcoming common failure modes.
Contribution
It demonstrates that prompting LLMs to actively intervene enhances their reverse-engineering capabilities and provides practical strategies for AI-driven scientific discovery.
Findings
Active prompting improves LLM reverse-engineering performance.
Interventions help LLMs avoid overcomplication and overlooking failures.
Sharing intervention data among LLMs further enhances understanding.
Abstract
Using AI to create autonomous researchers has the potential to accelerate scientific discovery. A prerequisite for this vision is understanding how well an AI model can identify the underlying structure of a black-box system from its behavior. In this paper, we explore how well a large language model (LLM) learns to identify a black-box function from passively observed versus actively collected data. We investigate the reverse-engineering capabilities of LLMs across three distinct types of black-box systems, each chosen to represent different problem domains where future autonomous AI researchers may have considerable impact: Program, Formal Language, and Math Equation. Through extensive experiments, we show that LLMs fail to extract information from observations, reaching a performance plateau that falls short of the ideal of Bayesian inference. However, we demonstrate that prompting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
