FARM: Enhancing Molecular Representations with Functional Group Awareness
Thao Nguyen, Kuan-Hao Huang, Ge Liu, Martin D. Burke, Ying Diao, Heng Ji

TL;DR
FARM introduces functional group-aware molecular representations that integrate chemical knowledge into SMILES and graph models, leading to improved performance in molecular property prediction tasks.
Contribution
The paper presents a novel foundation model that combines functional group annotations with SMILES and graph representations, enhancing molecular understanding for machine learning.
Findings
Achieved state-of-the-art results on 8 out of 13 MoleculeNet tasks.
Demonstrated strong generalization on a photostability dataset.
Improved transfer learning capabilities for drug discovery and materials science.
Abstract
We introduce Functional Group-Aware Representations for Small Molecules (FARM), a novel foundation model designed to bridge the gap between SMILES, natural language, and molecular graphs. The key idea behind FARM is the incorporation of functional group (FG) annotations at the atomic level, enabling both FG-enhanced SMILES and FG graphs. In this representation, SMILES strings are enriched with functional group information that identifies the group membership of each atom, while the FG graph captures molecular structure by representing how functional groups are connected. This tokenization injects chemical knowledge into SMILES and expands the effective molecular vocabulary, making the representation more suitable for Transformer-based models and more aligned with natural language structure. FARM learns molecular representations from two complementary perspectives to jointly encode…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
