G2T-LLM: Graph-to-Tree Text Encoding for Molecule Generation with Fine-Tuned Large Language Models
Zhaoning Yu, Xiangyang Xu, Hongyang Gao

TL;DR
G2T-LLM introduces a hierarchical text encoding for molecules that leverages large language models to generate valid, diverse chemical structures with natural language interaction, showing competitive results on benchmarks.
Contribution
The paper presents a novel graph-to-tree text encoding method for molecule generation that enables LLMs to produce valid chemical structures with minimal task-specific tuning.
Findings
Achieved comparable performance to state-of-the-art methods on benchmarks.
Enabled natural language interaction for molecular design.
Generated diverse and valid molecular structures.
Abstract
We introduce G2T-LLM, a novel approach for molecule generation that uses graph-to-tree text encoding to transform graph-based molecular structures into a hierarchical text format optimized for large language models (LLMs). This encoding converts complex molecular graphs into tree-structured formats, such as JSON and XML, which LLMs are particularly adept at processing due to their extensive pre-training on these types of data. By leveraging the flexibility of LLMs, our approach allows for intuitive interaction using natural language prompts, providing a more accessible interface for molecular design. Through supervised fine-tuning, G2T-LLM generates valid and coherent chemical structures, addressing common challenges like invalid outputs seen in traditional graph-based methods. While LLMs are computationally intensive, they offer superior generalization and adaptability, enabling the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Chemical Synthesis and Analysis
