Sampling Latent Material-Property Information From LLM-Derived Embedding Representations
Luke P. J. Gilligan, Matteo Cobelli, Hasan M. Sayeed, Taylor D. Sparks, and Stefano Sanvito

TL;DR
This paper explores how large language model-derived embeddings can capture latent material-property information, assessing their potential to inform materials science predictions without additional training.
Contribution
It demonstrates that LLM embeddings can reflect certain material properties, highlighting the importance of context and comparison methods for effective extraction.
Findings
LLM embeddings can encode some material property information
Optimal contextual clues are necessary for extracting meaningful data
LLMs have potential for generating useful materials representations
Abstract
Vector embeddings derived from large language models (LLMs) show promise in capturing latent information from the literature. Interestingly, these can be integrated into material embeddings, potentially useful for data-driven predictions of materials properties. We investigate the extent to which LLM-derived vectors capture the desired information and their potential to provide insights into material properties without additional training. Our findings indicate that, although LLMs can be used to generate representations reflecting certain property information, extracting the embeddings requires identifying the optimal contextual clues and appropriate comparators. Despite this restriction, it appears that LLMs still have the potential to be useful in generating meaningful materials-science representations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · Welding Techniques and Residual Stresses · Non-Destructive Testing Techniques
