MELT: Materials-aware Continued Pre-training for Language Model   Adaptation to Materials Science

Junho Kim; Yeachan Kim; Jun-Hyung Park; Yerim Oh; Suho Kim; SangKeun; Lee

arXiv:2410.15126·cs.CL·October 22, 2024

MELT: Materials-aware Continued Pre-training for Language Model Adaptation to Materials Science

Junho Kim, Yeachan Kim, Jun-Hyung Park, Yerim Oh, Suho Kim, SangKeun, Lee

PDF

Open Access 1 Repo 1 Video

TL;DR

MELT is a materials-aware continued pre-training method that enhances language models for materials science by integrating domain knowledge and curriculum learning, leading to improved representation and performance.

Contribution

This paper introduces MELT, a novel adaptation strategy that combines knowledge graph construction and curriculum-based training for better materials science language modeling.

Findings

01

MELT outperforms existing pre-training methods on multiple benchmarks.

02

It effectively captures materials entities and concepts.

03

Demonstrates broad applicability across materials science tasks.

Abstract

We introduce a novel continued pre-training method, MELT (MatEriaLs-aware continued pre-Training), specifically designed to efficiently adapt the pre-trained language models (PLMs) for materials science. Unlike previous adaptation strategies that solely focus on constructing domain-specific corpus, MELT comprehensively considers both the corpus and the training strategy, given that materials science corpus has distinct characteristics from other domains. To this end, we first construct a comprehensive materials knowledge base from the scientific corpus by building semantic graphs. Leveraging this extracted knowledge, we integrate a curriculum into the adaptation process that begins with familiar and generalized concepts and progressively moves toward more specialized terms. We conduct extensive experiments across diverse benchmarks to verify the effectiveness and generality of MELT. A…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JunhoKim94/MELT
pytorch

Videos

MELT: Materials-aware Continued Pre-training for Language Model Adaptation to Materials Science· underline

Taxonomy

TopicsMachine Learning in Materials Science

MethodsBalanced Selection · Focus