Harmformer: Harmonic Networks Meet Transformers for Continuous   Roto-Translation Equivariance

Tom\'a\v{s} Karella; Adam Harmanec; Jan Kotera; Jan Bla\v{z}ek; Filip; \v{S}roubek

arXiv:2411.03794·cs.CV·November 7, 2024

Harmformer: Harmonic Networks Meet Transformers for Continuous Roto-Translation Equivariance

Tom\'a\v{s} Karella, Adam Harmanec, Jan Kotera, Jan Bla\v{z}ek, Filip, \v{S}roubek

PDF

Open Access

TL;DR

Harmformer is a novel harmonic transformer that achieves continuous rotation and translation equivariance, improving robustness and efficiency in image processing tasks without needing rotated training samples.

Contribution

It introduces Harmformer, the first transformer with proven continuous rotation and translation equivariance, extending harmonic functions to transformer architectures.

Findings

01

Outperforms previous equivariant transformers in accuracy.

02

Demonstrates stability under arbitrary continuous rotations.

03

Operates effectively without training on rotated data.

Abstract

CNNs exhibit inherent equivariance to image translation, leading to efficient parameter and data usage, faster learning, and improved robustness. The concept of translation equivariant networks has been successfully extended to rotation transformation using group convolution for discrete rotation groups and harmonic functions for the continuous rotation group encompassing $36 0^{\circ}$ . We explore the compatibility of the SA mechanism with full rotation equivariance, in contrast to previous studies that focused on discrete rotation. We introduce the Harmformer, a harmonic transformer with a convolutional stem that achieves equivariance for both translation and continuous rotation. Accompanied by an end-to-end equivariance proof, the Harmformer not only outperforms previous equivariant transformers, but also demonstrates inherent stability under any continuous rotation, even without seeing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing

MethodsConvolution