SPANet: Frequency-balancing Token Mixer using Spectral Pooling Aggregation Modulation
Guhnoo Yun, Juhan Yoo, Kijung Kim, Jeongho Lee, Dong Hwan Kim

TL;DR
This paper introduces SPANet, a spectral pooling-based token mixer that balances high- and low-frequency features to improve model performance across various computer vision tasks.
Contribution
The paper proposes a novel spectral pooling-based token mixer called SPAM and a balanced frequency representation approach for enhanced vision model performance.
Findings
Balanced frequency representations improve model accuracy
Spectral pooling enhances feature extraction across tasks
SPAM outperforms existing token mixers in experiments
Abstract
Recent studies show that self-attentions behave like low-pass filters (as opposed to convolutions) and enhancing their high-pass filtering capability improves model performance. Contrary to this idea, we investigate existing convolution-based models with spectral analysis and observe that improving the low-pass filtering in convolution operations also leads to performance improvement. To account for this observation, we hypothesize that utilizing optimal token mixers that capture balanced representations of both high- and low-frequency components can enhance the performance of models. We verify this by decomposing visual features into the frequency domain and combining them in a balanced manner. To handle this, we replace the balancing problem with a mask filtering problem in the frequency domain. Then, we introduce a novel token-mixer named SPAM and leverage it to derive a MetaFormer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
SPANet: Frequency-balancing Token Mixer using Spectral Pooling Aggregation Modulation· youtube
Taxonomy
TopicsImage Enhancement Techniques · Visual Attention and Saliency Detection · CCD and CMOS Imaging Sensors
MethodsMetaFormer · Convolution
