Large Language Model Assisted Discovery of Optimal Dopants for Enhanced Thermoelectric Performance in CoSb$_3$ Based Skutterudites
Yagnik Bandyopadhyay, Dylan Noel Serrao, Houlong L. Zhuang

TL;DR
This paper introduces a data-driven method combining large language models, machine learning, and quantum simulations to discover and optimize dopants for high-performance thermoelectric CoSb3 skutterudites.
Contribution
It develops a novel LLM-based predictive model trained on literature data, enabling efficient identification of promising thermoelectric dopants and compositions.
Findings
LLM-based model outperforms traditional elemental descriptor models in accuracy.
The approach successfully predicts new filler compositions with high thermoelectric potential.
Density functional theory and molecular dynamics validate the promising candidates.
Abstract
We present a data-driven approach for accelerating the discovery of high-performance CoSb-based skutterudites by curating a comprehensive dataset of compositions with various filler elements from over 300 research articles. Leveraging large language models (LLMs), we extract and embed compositional representations, which are then used to train a regression head for predicting thermoelectric figure of merit. Compared to traditional deep neural networks relying on elemental descriptors such as atomic radii, our LLM-based model achieves significantly lower mean-squared error losses. We further employ the trained model to propose novel filler compositions with promising thermoelectric properties. Finally, we support these predicted candidates through density functional theory and molecular dynamics calculations to assess their electrical and thermal conductivity. This data-driven…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
