SE(3)-Hyena Operator for Scalable Equivariant Learning
Artem Moskalev, Mangal Prakash, Rui Liao, Tommaso Mansi

TL;DR
The paper introduces SE(3)-Hyena, a scalable equivariant model that efficiently captures global geometric context with sub-quadratic complexity, outperforming existing methods in speed and memory for long sequence processing.
Contribution
It presents the SE(3)-Hyena operator, a novel equivariant long-convolutional model based on Hyena, enabling scalable global context modeling with reduced computational costs.
Findings
Matches or outperforms equivariant self-attention in accuracy.
Processes 20k tokens 3.5 times faster than equivariant transformer.
Allows 175 times longer context within the same memory budget.
Abstract
Modeling global geometric context while maintaining equivariance is crucial for accurate predictions in many fields such as biology, chemistry, or vision. Yet, this is challenging due to the computational demands of processing high-dimensional data at scale. Existing approaches such as equivariant self-attention or distance-based message passing, suffer from quadratic complexity with respect to sequence length, while localized methods sacrifice global information. Inspired by the recent success of state-space and long-convolutional models, in this work, we introduce SE(3)-Hyena operator, an equivariant long-convolutional model based on the Hyena operator. The SE(3)-Hyena captures global geometric context at sub-quadratic complexity while maintaining equivariance to rotations and translations. Evaluated on equivariant associative recall and n-body modeling, SE(3)-Hyena matches or…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
The idea of using global information to improve the model is very natural, and the convolution simplification of the cross product is very elegant.
> **W1. Lack of discussion on related work.** There are many works that use global features to improve equivariance. Although this paper's work is obviously different from them, it is recommended to add a discussion on these works (e.g. FastEGNN [a], Neural P^3M [b]). > **W2. The motivation for operator design is unclear.** Why is the motivation for using the cross product not well explained? As we all know, the cross product is the Hodge star dual of the outer product. Can it be explained fr
Applying deep learning to problems that involve modeling geometric context, as done in this paper, is a valuable direction in the field. Performance improvements in this area often depend on architectural advances, which is also good to see.
- The experiments in the paper are not comprehensive enough to clearly demonstrate the advantages of the proposed method over existing ones. For example, baselines used in the paper -- SchNet, EGNN, and SE(3)-Transformer -- have been evaluated on the QM9 dataset in their original papers. It would be more convincing if the authors included results on QM9 as well. - Several state-of-the-art baselines for dynamical system modeling are missing, making the performance of the proposed model not convi
1. Using a sub-quadratic operator for global context in 3D atomistic modeling can be potentially a good idea.
1. The writing should be greatly improved. See Questions below for more details. 2. Experiments in the paper are very limited. The first two experiments basically tell little about how effective a network architecture can be. The third one does not show the benefit of SE(3)-Hyena except that the proposed method takes less memory. 3. Lack of comparisons to previous works on other better-benchmarked datasets such as MD17, QM9 and so on. Overall, how effective modeling global context is remains unc
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
