MotifPiece: A Data-Driven Approach for Effective Motif Extraction and Molecular Representation Learning
Zhaoning Yu, Hongyang Gao

TL;DR
MotifPiece is a novel data-driven method for extracting molecular motifs that preserves topological information, improving molecular representation learning by leveraging statistical measures and heterogeneous learning modules.
Contribution
The paper introduces MotifPiece, a new motif extraction technique that overcomes limitations of rule-based and string-based methods, and demonstrates enhanced performance through data and dataset merging strategies.
Findings
MotifPiece outperforms previous models in motif extraction tasks.
Incorporating more data improves motif vocabulary richness.
Merging datasets sharing motifs enhances model performance.
Abstract
Motif extraction is an important task in motif based molecular representation learning. Previously, machine learning approaches employing either rule-based or string-based techniques to extract motifs. Rule-based approaches may extract motifs that aren't frequent or prevalent within the molecular data, which can lead to an incomplete understanding of essential structural patterns in molecules. String-based methods often lose the topological information inherent in molecules. This can be a significant drawback because topology plays a vital role in defining the spatial arrangement and connectivity of atoms within a molecule, which can be critical for understanding its properties and behavior. In this paper, we develop a data-driven motif extraction technique known as MotifPiece, which employs statistical measures to define motifs. To comprehensively evaluate the effectiveness of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Chemical Synthesis and Analysis · Machine Learning in Materials Science
