Benchmarking GNN Models on Molecular Regression Tasks with CKA-Based Representation Analysis
Rajan, Ishaan Gupta

TL;DR
This study benchmarks various GNN architectures for molecular property prediction, introduces a fusion framework combining GNN and fingerprints, and analyzes their learned representations using CKA, revealing diverse embedding similarities.
Contribution
It provides a systematic comparison of GNN models on molecular datasets, proposes a hierarchical fusion approach, and applies CKA analysis to understand model representations.
Findings
Fusion framework improves RMSE by over 7%
GNN and fingerprint embeddings occupy distinct latent spaces
High similarity among GCN, GraphSAGE, and GIN embeddings
Abstract
Molecules are commonly represented as SMILES strings, which can be readily converted to fixed-size molecular fingerprints. These fingerprints serve as feature vectors to train ML/DL models for molecular property prediction tasks in the field of computational chemistry, drug discovery, biochemistry, and materials science. Recent research has demonstrated that SMILES can be used to construct molecular graphs where atoms are nodes () and bonds are edges (). These graphs can subsequently be used to train geometric DL models like GNN. GNN learns the inherent structural relationships within a molecule rather than depending on fixed-size fingerprints. Although GNN are powerful aggregators, their efficacy on smaller datasets and inductive biases across different architectures is less studied. In our present study, we performed a systematic benchmarking of four different GNN architectures…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Bioinformatics · Machine Learning in Materials Science
