Equivariant Spherical Transformer for Efficient Molecular Modeling

Junyi An; Xinyu Lu; Chao Qu; Yunfei Shi; Peijia Lin; Qianwei Tang; Licheng Xu; Fenglei Cao; Yuan Qi

arXiv:2505.23086·cs.LG·September 30, 2025

Equivariant Spherical Transformer for Efficient Molecular Modeling

Junyi An, Xinyu Lu, Chao Qu, Yunfei Shi, Peijia Lin, Qianwei Tang, Licheng Xu, Fenglei Cao, Yuan Qi

PDF

Open Access 3 Reviews

TL;DR

The paper introduces the Equivariant Spherical Transformer (EST), a novel architecture that enhances the expressiveness of molecular modeling by applying Transformer-like mechanisms to group representations while maintaining equivariance, leading to state-of-the-art results.

Contribution

EST is a new plug-and-play framework that improves the expressiveness of equivariant GNNs for molecular modeling by integrating Fourier transforms and Transformer architecture.

Findings

01

EST achieves state-of-the-art performance on OC20 and QM9 benchmarks.

02

Small EST-based models outperform larger models with more data.

03

Theoretical and experimental validation confirms EST's equivariance properties.

Abstract

Equivariant Graph Neural Networks (GNNs) have significantly advanced the modeling of 3D molecular structure by leveraging group representations. However, their message passing, heavily relying on Clebsch-Gordan tensor product convolutions, suffers from restricted expressiveness due to the limited non-linearity and low degree of group representations. To overcome this, we introduce the Equivariant Spherical Transformer (EST), a novel plug-and-play framework that applies a Transformer-like architecture to the Fourier spatial domain of group representations. EST achieves higher expressiveness than conventional models while preserving the crucial equivariant inductive bias through a uniform sampling strategy of spherical Fourier transforms. As demonstrated by our experiments on challenging benchmarks like OC20 and QM9, EST-based models achieve state-of-the-art performance. For the complex…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 5

Strengths

1. Applying attention mechanisms in the spherical spatial domain is an interesting alternative to tensor product operations 2. Testing on both OC20 and QM9 with multiple metrics

Weaknesses

1. Undefined Notations and Poor Presentation: - "EST (with GA)" in Table 3 is never defined in the main text - Multiple undefined abbreviations: EwT (Table 2), PW-Linear, DTP - Inconsistent notation (S² vs S^2, multiple uses of C for different dimensions) 2. Theorem 1: Claims "strict SO(3)-equivariance" but this is impossible with finite sampling. 3. Fibonacci lattice sampling (Eq.~11): incorrect formula \& missing definitions. The manuscript writes the FL coordinates as $ \vec p_s = \big[ p_1

Reviewer 02Rating 2Confidence 4

Strengths

1. The proposed theoretical framework presents valuable insights that could substantially contribute to the future development of equivariant architectures.

Weaknesses

1. The primary weakness lies in the insufficient experimental evaluation. Specifically, the experiments lack efficiency comparisons against existing architectures such as Equiformer, eSEN, and Equiformer V2. Additionally, the introduction of the mixture-of-hybrid-experts module appears orthogonal to the proposed theoretical framework, which diminishes the perceived contribution of the spherical attention mechanism. It gives the impression that the theoretical innovation provides limited practica

Reviewer 03Rating 6Confidence 4

Strengths

### Strengths - The authors provide a good explanation of the Spherical Fourier Transform and of their proposed method. The writing is overall easy to follow and understand. - The fact that EST is more expressive than tensor-product-based models is a very strong and novel contribution - EST can be integrated into existing model designs, making is a flexible approach that can build on existing works

Weaknesses

### Weaknesses - The authors do not have a very convincing set of experiments. OC20 and QM9 are older and relatively saturated datasets, and most recent works on MLIPs are training and evaluating on the SPICE-MACE-OFF [1] and MPtrj datasets [2]. The paper is also missing comparisons to many recently developed MLIPs such as eSEN [3]. - The authors do not attempt to train a larger scale "foundation" model based on EST or provide ablation experiments to demonstrate the scaling of the proposed metho

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Machine Learning in Materials Science · Graph Theory and Algorithms

MethodsLinear Layer · Dense Connections · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Label Smoothing · Multi-Head Attention · Attention Is All You Need · Layer Normalization · Byte Pair Encoding