FGBench: A Dataset and Benchmark for Molecular Property Reasoning at Functional Group-Level in Large Language Models
Xuan Liu, Siru Ouyang, Xianrui Zhong, Jiawei Han, Huimin Zhao

TL;DR
FGBench introduces a large, annotated dataset for molecular property reasoning at the functional group level, aiming to improve LLMs' interpretability and reasoning in chemistry tasks.
Contribution
The paper presents FGBench, a novel dataset with detailed functional group annotations for molecular property reasoning, facilitating fine-grained understanding in LLMs.
Findings
Current LLMs struggle with functional group-level reasoning.
FGBench enables the development of more interpretable, structure-aware LLMs.
The dataset supports diverse tasks including regression and classification across multiple functional groups.
Abstract
Large language models (LLMs) have gained significant attention in chemistry. However, most existing datasets center on molecular-level property prediction and overlook the role of fine-grained functional group (FG) information. Incorporating FG-level data can provide valuable prior knowledge that links molecular structures with textual descriptions, which can be used to build more interpretable, structure-aware LLMs for reasoning on molecule-related tasks. Moreover, LLMs can learn from such fine-grained information to uncover hidden relationships between specific functional groups and molecular properties, thereby advancing molecular design and drug discovery. Here, we introduce FGBench, a dataset comprising 625K molecular property reasoning problems with functional group information. Functional groups are precisely annotated and localized within the molecule, which ensures the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Advanced Graph Neural Networks
