GIMLET: A Unified Graph-Text Model for Instruction-Based Molecule Zero-Shot Learning
Haiteng Zhao, Shengchao Liu, Chang Ma, Hannan Xu, Jie Fu, Zhi-Hong, Deng, Lingpeng Kong, Qi Liu

TL;DR
GIMLET is a unified graph-text model that enables instruction-based zero-shot molecule property prediction, effectively leveraging textual instructions and graph structures without additional graph encoding modules.
Contribution
This paper introduces GIMLET, a novel model that unifies language and graph encoding for zero-shot molecule tasks, improving generalization and performance over existing methods.
Findings
GIMLET outperforms baseline models in instruction-based zero-shot learning.
GIMLET achieves results close to supervised GNN models on key benchmarks.
The model effectively encodes both graph structures and instructions without extra modules.
Abstract
Molecule property prediction has gained significant attention in recent years. The main bottleneck is the label insufficiency caused by expensive lab experiments. In order to alleviate this issue and to better leverage textual knowledge for tasks, this study investigates the feasibility of employing natural language instructions to accomplish molecule-related tasks in a zero-shot setting. We discover that existing molecule-text models perform poorly in this setting due to inadequate treatment of instructions and limited capacity for graphs. To overcome these issues, we propose GIMLET, which unifies language models for both graph and text data. By adopting generalized position embedding, our model is extended to encode both graph structures and instruction text without additional graph encoding modules. GIMLET also decouples encoding of the graph from tasks instructions in the attention…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Machine Learning in Bioinformatics
