MatExpert: Decomposing Materials Discovery by Mimicking Human Experts
Qianggang Ding, Santiago Miret, Bang Liu

TL;DR
MatExpert is a novel framework that mimics human experts by combining LLMs and contrastive learning to accelerate solid-state material discovery through retrieval, transition, and generation stages.
Contribution
It introduces a new three-stage approach inspired by human workflows, integrating LLMs and contrastive learning for improved material design and discovery.
Findings
Outperforms state-of-the-art methods in material generation tasks
Achieves higher validity, distribution, and stability in generated materials
Demonstrates effectiveness across various experimental metrics
Abstract
Material discovery is a critical research area with profound implications for various industries. In this work, we introduce MatExpert, a novel framework that leverages Large Language Models (LLMs) and contrastive learning to accelerate the discovery and design of new solid-state materials. Inspired by the workflow of human materials design experts, our approach integrates three key stages: retrieval, transition, and generation. First, in the retrieval stage, MatExpert identifies an existing material that closely matches the desired criteria. Second, in the transition stage, MatExpert outlines the necessary modifications to transform this material formulation to meet specific requirements outlined by the initial user query. Third, in the generation state, MatExpert performs detailed computations and structural generation to create new materials based on the provided information. Our…
Peer Reviews
Decision·ICLR 2025 Poster
1. The design of MatExpert mirrors the expert-driven process in material science, breaking down material generation into retrieval, transition, and generation stages. This structured approach allows for iterative refinement. 2. The transition stage uses a CoT reasoning process, enabling the model to outline logical, step-by-step modifications to meet target properties. This sequential reasoning contributes to the model's ability to achieve high accuracy in conditional generation tasks. 3. By c
1. While the multi-stage design of MatExpert improves accuracy, it adds computational complexity and potentially increases training time compared to single-step models. 2. The proposed framework will have cumulative errors. If the result retrieved in the first step is far away from the target, it will be difficult to correct it later, thus affecting the results of subsequent steps. 3. This paper focuses on innovation in application scenarios, and the technological innovation is relatively lim
- The application of LLM to materials is interesting and materials discovery is important - The evaluation metrics includes stability computed with DFT not just proxy metrics - Writing style and related work are good
- Lacking details on what data was used for which tasks? For the unconditional results on MP-20, it is unclear if the NOMAD data was also used for training MatExpert. For the conditional results, were CrystalLLM and MatExpert trained on the same data? - The results in Figure 5 are not well quantified i.e. it is not clear MatExpert is better. Also, there are 11 bars but only 9 labels, not sure if I missed something? The colors are very similar in some cases, hard to parse quickly. - There is n
* The integration of Robocrystallographer enriches crystal data with textual descriptions, enhancing the retrieval process and interpretability. * MatExpert achieves impressive performance on benchmarks, demonstrating its reliability in generating valid and diverse material structures. * Contrastive learning effectively maps structure and property embeddings, which is a novel approach for aligning multimodal material data.
* The novelty of multi-stages material generation is a bit limited as it’s being studied in other works [1, 2, 3]. In the introduction, the author mention the drawback of the current method is the single step material structure generation. However, some cited paper include multi-steps material generation and property query already [1, 2, 3]. It will be helpful to have more discussion on those methods. * The paper could benefit from more clarity on the pathway generation process. Specifically, i
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Mineral Processing and Grinding · Image Processing and 3D Reconstruction
MethodsContrastive Learning
