SONIC: Spectral Oriented Neural Invariant Convolutions

Gijs Joppe Moens; Regina Beets-Tan; Eduardo H. P. Pooch

arXiv:2601.19884·cs.CV·January 28, 2026

SONIC: Spectral Oriented Neural Invariant Convolutions

Gijs Joppe Moens, Regina Beets-Tan, Eduardo H. P. Pooch

PDF

Open Access 3 Reviews

TL;DR

SONIC introduces a spectral convolutional approach that captures global context and orientation selectivity, improving robustness and efficiency over traditional CNNs and ViTs across various vision tasks.

Contribution

It proposes a novel spectral parameterisation for convolutions using shared, orientation-selective components, enabling global receptive fields with fewer parameters.

Findings

01

Enhanced robustness to geometric transformations and noise

02

Matches or exceeds performance of existing architectures

03

Uses significantly fewer parameters

Abstract

Convolutional Neural Networks (CNNs) rely on fixed-size kernels scanning local patches, which limits their ability to capture global context or long-range dependencies without very deep architectures. Vision Transformers (ViTs), in turn, provide global connectivity but lack spatial inductive bias, depend on explicit positional encodings, and remain tied to the initial patch size. Bridging these limitations requires a representation that is both structured and global. We introduce SONIC (Spectral Oriented Neural Invariant Convolutions), a continuous spectral parameterisation that models convolutional operators using a small set of shared, orientation-selective components. These components define smooth responses across the full frequency domain, yielding global receptive fields and filters that adapt naturally across resolutions. Across synthetic benchmarks, large-scale image…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 2Confidence 4

Strengths

- The paper overall is easy to follow.

Weaknesses

- My main concern is the unclear comparisons with the existing solutions. In Section 2 (line 149), the authors mentioned some limitations of previous solutions like GFNet and FNO. However, there is no further study to show how the proposed method overcomes these limitations and why these limitations are important. Besides, there is also no direct comparison with these methods in the experiments. For example, there are two limitations of GFNet mentioned: "the FFT grid is tied to the input resolut

Reviewer 02Rating 8Confidence 2

Strengths

The paper is well written, very well motivated and easy to follow. The presented idea is novel and presents an interesting concept, bringing core elements of state-space system into the the Frequency domain. The fact that the learnable filters are formulated in a continuous parameterization is particularly noteworthy. This actually opens the door to solve many sampling related problems in current network designs (aliasing causing low robustness, limitation to fixed input sizes, low robustness

Weaknesses

There are several aspects in which the paper could be improved: 1) the paper mentions several previous approaches of Fourier-domain feature extraction (page 3 bottom), but does not compere to these methods in the experiments or in terms of computational complexity 2) the authors missed to discuss and to compare to [1] - another Fourier-domain approach of efficient large kernel implementations. 3) the experiments comparing to CNNs do not show the used kernel size (also not in the appendix). H

Reviewer 03Rating 4Confidence 2

Strengths

1. The paper presents a spectral framework for multidimensional signals that offers global receptive fields, complete convolutional capability, and built-in resolution invariance. It provides a lightweight and flexible foundation for building scalable, adaptable vision models. 2. The paper presents comprehensive empirical validation across both synthetic and real-world settings.

Weaknesses

1. The innovations mentioned in the abstract and contributions are mainly about unifying existing convolution kernels, spectral filtering, and state-space kernels under one spectral framework. However, the idea of parameterizing operators in the frequency or linear domain already exists in models such as S4ND, GFNet, FNO, and Mamba. The so-called directional modes only add a few interpretable parameters (e.g., direction, scale, damping) to the frequency response function, but in essence, it is s

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace Recognition and Perception · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning