Benchmarking GNN Models on Molecular Regression Tasks with CKA-Based Representation Analysis

Rajan; Ishaan Gupta

arXiv:2602.20573·cs.LG·March 10, 2026

Benchmarking GNN Models on Molecular Regression Tasks with CKA-Based Representation Analysis

Rajan, Ishaan Gupta

PDF

Open Access

TL;DR

This study benchmarks various GNN architectures for molecular property prediction, introduces a fusion framework combining GNN and fingerprints, and analyzes their learned representations using CKA, revealing diverse embedding similarities.

Contribution

It provides a systematic comparison of GNN models on molecular datasets, proposes a hierarchical fusion approach, and applies CKA analysis to understand model representations.

Findings

01

Fusion framework improves RMSE by over 7%

02

GNN and fingerprint embeddings occupy distinct latent spaces

03

High similarity among GCN, GraphSAGE, and GIN embeddings

Abstract

Molecules are commonly represented as SMILES strings, which can be readily converted to fixed-size molecular fingerprints. These fingerprints serve as feature vectors to train ML/DL models for molecular property prediction tasks in the field of computational chemistry, drug discovery, biochemistry, and materials science. Recent research has demonstrated that SMILES can be used to construct molecular graphs where atoms are nodes ( $V$ ) and bonds are edges ( $E$ ). These graphs can subsequently be used to train geometric DL models like GNN. GNN learns the inherent structural relationships within a molecule rather than depending on fixed-size fingerprints. Although GNN are powerful aggregators, their efficacy on smaller datasets and inductive biases across different architectures is less studied. In our present study, we performed a systematic benchmarking of four different GNN architectures…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Drug Discovery Methods · Machine Learning in Bioinformatics · Machine Learning in Materials Science