Exploiting Latent Linearity in LLMs Improves Explainable Molecular Representation Learning

Zhuoran Li; Xu Sun; Wanyu Lin; Jiannong Cao

arXiv:2410.08829·cs.LG·February 3, 2026

Exploiting Latent Linearity in LLMs Improves Explainable Molecular Representation Learning

Zhuoran Li, Xu Sun, Wanyu Lin, Jiannong Cao

PDF

Open Access 1 Repo

TL;DR

This paper introduces MoleX, a framework that leverages latent linearity in LLMs to enhance explainability and performance in molecular representation learning, with significant improvements in speed and efficiency.

Contribution

MoleX decomposes molecular embeddings into a concept-aligned space, revealing linear structures that align with chemical principles and improve downstream task performance.

Findings

01

MoleX outperforms existing methods in accuracy and explainability.

02

It achieves 300x faster CPU inference on large datasets.

03

Uses 100,000 fewer parameters than traditional LLMs.

Abstract

Large language models (LLMs) have demonstrated broad utility across molecular domains, spanning drug discovery and materials design. Analyzing LLMs' latent representations is crucial for elucidating their underlying mechanisms, improving explainability, and ultimately advancing downstream performance. We propose MoleX, a simple yet effective framework that decomposes molecular embeddings within LLM representations into a concept-aligned space for explainable molecular representation learning. We further show that these high-dimensional embeddings admit a linear mapping onto chemically consistent concepts. Our analysis suggests that the uncovered linearity aligns with established chemical principles, indicating a mechanistically explainable latent structure in LLM representations for scientific applications. When applied to downstream tasks, this latent linearity improves both predictive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

molex2024/molex
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · History and advancements in chemistry