Foundation-Model Surrogates Enable Data-Efficient Active Learning for Materials Discovery
Jeffrey Hu, Rongzhi Dong, Ying Feng, Ming Hu, Jianjun Hu

TL;DR
This paper introduces ICAL, a novel active learning approach using a foundation model as a surrogate, significantly improving data efficiency and uncertainty calibration in materials discovery tasks.
Contribution
The paper presents ICAL, replacing traditional surrogates with a pre-trained transformer model, enabling effective Bayesian inference without retraining, and demonstrating superior performance in materials datasets.
Findings
ICAL outperforms GP and RF on 8 of 10 datasets.
Achieves 52% reduction in extra evaluations compared to GP.
Exhibits superior uncertainty calibration, lowest Negative Log-Likelihood.
Abstract
Active learning (AL) has emerged as a powerful paradigm for accelerating materials discovery by iteratively steering experiments toward promising candidates, reducing the number of costly synthesis-and-characterization cycles needed to identify optimal materials. However, current AL relies predominantly on Gaussian Process (GP) and Random Forest (RF) surrogates, which suffer from complementary limitations: GP underfits complex composition-property landscapes due to rigid kernel assumptions, while RF produces unreliable heuristic uncertainty estimates in small-data regimes. This small-data challenge is pervasive in materials science, making reliable surrogate modeling extremely difficult with models trained from scratch on each new dataset. Here we propose In-Context Active Learning (ICAL), which addresses this bottleneck by replacing conventional surrogates with TabPFN, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Gaussian Processes and Bayesian Inference · Domain Adaptation and Few-Shot Learning
