SLIM: Sparse Latent Steering for Interpretable and Property-Directed LLM-Based Molecular Editing
Mingxu Zhang, Yuhan Li, Lujundong Li, Dazhong Shen, Hui Xiong, Ying Sun

TL;DR
SLIM introduces a sparse, interpretable framework for molecular editing with large language models, enabling precise property control and improved success rates across multiple architectures and properties.
Contribution
It proposes a novel sparse autoencoder approach with learnable importance gates for property-aligned feature decomposition in molecular editing.
Findings
Achieved up to 42.4 points improvement on MolEditRL benchmark.
Enhanced property control without modifying model parameters.
Supported interpretable analysis of editing behavior.
Abstract
Large language models possess strong chemical reasoning capabilities, making them effective molecular editors. However, property-relevant information is implicitly entangled across their dense hidden states, providing no explicit handle for property control: a substantial fraction of edits fail to improve or even degrade target properties. To address these issues, we propose SLIM (Sparse Latent Interpretable Molecular editing), a plug-and-play framework that decomposes the editor's hidden states into sparse, property-aligned features via a Sparse Autoencoder with learnable importance gates. Steering in this sparse feature space precisely activates property-relevant dimensions, improving editing success rate without modifying model parameters. The same sparse basis further supports interpretable analysis of editing behavior. Experiments on the MolEditRL benchmark across four model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
