When Does Critique Improve AI-Assisted Theoretical Physics? SCALAR: Structured Critic--Actor Loop for Agentic Reasoning
Vasilis Niarchos, Constantinos Papageorgakis, Alexander G. Stapleton, Sokratis Trifinopoulos

TL;DR
This paper introduces SCALAR, a structured Actor--Critic framework for AI reasoning in physics, analyzing how different interaction strategies affect research-level problem solving with large language models.
Contribution
It presents a novel Actor--Critic--Judge pipeline for physics reasoning and systematically evaluates how interaction strategies influence AI-assisted scientific discovery.
Findings
Multi-turn dialogue improves problem-solving performance.
Critic feedback is most beneficial in asymmetric Actor--Critic setups.
Increasing model scale enhances easier problem performance but not the hardest problems.
Abstract
As large language models (LLMs) show increasing promise on research-level physics reasoning tasks and agentic AI becomes more common, a practical question emerges: How does the interaction between researchers and agents affect the results? We study this using SCALAR (Structured Critic--Actor Loop for AI Reasoning), an Actor--Critic--Judge pipeline applied to quantum field theory and string theory problems. The Actor proposes solutions, the Critic provides iterative feedback, and an independent Judge evaluates the transcript against reference solutions. We vary the Actor persona, the Critic feedback strategy, and the Actor model family and scale. Multi-turn dialogue improves over single-shot attempts throughout, but both the mechanism of improvement and the value of different prompting choices depend strongly on the Actor--Critic pairing. Increasing the scale within one model family…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
