A quantitative analysis of knowledge-learning preferences in large   language models in molecular science

Pengfei Liu; Jun Tao; Zhixiang Ren

arXiv:2402.04119·cs.LG·January 6, 2025·1 cites

A quantitative analysis of knowledge-learning preferences in large language models in molecular science

Pengfei Liu, Jun Tao, Zhixiang Ren

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multi-modal benchmark and analysis framework to quantify how large language models learn and adapt to different data modalities in molecular science, enhancing understanding of their knowledge acquisition.

Contribution

It proposes a novel benchmark and a statistical approach to analyze modality compatibility and knowledge preferences in large language models for molecular applications.

Findings

01

Identified key data modalities for specific molecular tasks

02

Developed a statistically interpretable method for knowledge mapping

03

Provided insights into model-data modality compatibility

Abstract

Deep learning has significantly advanced molecular modeling and design, enabling efficient understanding and discovery of novel molecules. In particular, large language models (LLMs) introduce a fresh research paradigm to tackle scientific problems from a natural language processing (NLP) perspective. LLMs significantly enhance our understanding and generation of molecules, often surpassing existing methods with their capabilities to decode and synthesize complex molecular patterns. However, two key issues remain: how to quantify the match between model and data modalities and how to identify the knowledge-learning preferences of models. To address these challenges, we propose a multi-modal benchmark, named ChEBI-20-MM, and perform 1263 experiments to assess the model's compatibility with data modalities and knowledge acquisition. Through the modal transition probability matrix, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ai-hpc-research-team/slm4mol
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling