Agentic Design of Compositional Descriptors via Autoresearch for Materials Science Applications
Matteo Cobelli, Stefano Sanvito

TL;DR
This paper demonstrates an autoresearch framework where an AI agent generates and evaluates composition-based descriptors for materials property prediction, outperforming traditional methods and producing interpretable features.
Contribution
The introduction of Automat, an autoresearch system using large language models to design chemically interpretable descriptors without manual feature engineering.
Findings
Automat improves prediction accuracy over baseline methods.
Generated descriptors are chemically interpretable.
Current limitations include descriptor redundancy and sensitivity to greedy expansion.
Abstract
Autoresearch offers a flexible paradigm for automating scientific tasks, in which an AI agent proposes, implements, evaluates, and refines candidate solutions against a quantitative objective. Here, we use composition-based materials-property prediction to test whether such agents can perform a task beyond model selection and hyperparameter optimization: the design of input descriptors. We introduce Automat, an autoresearch framework where a coding agent based on a large language model generates composition-only descriptors for chemical compounds and evaluates them using a random forest workflow. The agent is restricted to information derivable from chemical formulas and iteratively proposes, implements, and tests chemically motivated descriptor strategies. We apply Automat, with OpenAI Codex using GPT-5.5 as the coding agent, to the prediction of experimental band gaps in inorganic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
