Self-Supervised Learning on Molecular Graphs: A Systematic Investigation of Masking Design
Jiannan Yang, Veronika Thost, Tengfei Ma

TL;DR
This paper systematically investigates masking strategies in self-supervised learning for molecular graphs, revealing that target choice and encoder design are more impactful than complex masking distributions, guiding future SSL development.
Contribution
It introduces a probabilistic framework for evaluating masking strategies and provides empirical insights into their effectiveness in molecular graph SSL.
Findings
Uniform masking performs comparably to complex distributions.
Semantic prediction targets significantly improve downstream tasks.
Graph Transformer encoders benefit more from richer prediction targets.
Abstract
Self-supervised learning (SSL) plays a central role in molecular representation learning. Yet, many recent innovations in masking-based pretraining are introduced as heuristics and lack principled evaluation, obscuring which design choices are genuinely effective. This work cast the entire pretrain-finetune workflow into a unified probabilistic framework, enabling a transparent comparison and deeper understanding of masking strategies. Building on this formalism, we conduct a controlled study of three core design dimensions: masking distribution, prediction target, and encoder architecture, under rigorously controlled settings. We further employ information-theoretic measures to assess the informativeness of pretraining signals and connect them to empirically benchmarked downstream performance. Our findings reveal a surprising insight: sophisticated masking distributions offer no…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Computational Drug Discovery Methods · Machine Learning in Materials Science
