FFNet: MetaMixer-based Efficient Convolutional Mixer Design
Seokju Yun, Dongheon Lee, Youngmin Ro

TL;DR
FFNet introduces a novel, efficient convolutional mixer architecture inspired by transforming self-attention into a more FFN-like structure, achieving superior performance with simpler operators across vision tasks.
Contribution
The paper proposes FFNification, converting self-attention into a convolution-based token mixer, and introduces FFNet and MetaMixer architectures that outperform complex models with higher efficiency.
Findings
FFNet outperforms state-of-the-art methods in multiple vision benchmarks.
FFNet achieves notable efficiency gains over existing models.
MetaMixer provides a flexible, general architecture without sub-operation specifications.
Abstract
Transformer, composed of self-attention and Feed-Forward Network, has revolutionized the landscape of network design across various vision tasks. While self-attention is extensively explored as a key factor in performance, FFN has received little attention. FFN is a versatile operator seamlessly integrated into nearly all AI models to effectively harness rich representations. Recent works also show that FFN functions like key-value memories. Thus, akin to the query-key-value mechanism within self-attention, FFN can be viewed as a memory network, where the input serves as query and the two projection weights operate as keys and values, respectively. Based on these observations, we hypothesize that the importance lies in query-key-value framework itself for competitive performance. To verify this, we propose converting self-attention into a more FFN-like efficient token mixer with only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Advanced Graph Neural Networks
MethodsConvNeXt · Convolution
