Adaptive Split-Fusion Transformer
Zixuan Su, Hao Zhang, Jingjing Chen, Lei Pang, Chong-Wah Ngo, Yu-Gang, Jiang

TL;DR
The paper introduces ASF-former, a hybrid neural network model that adaptively combines convolutional and transformer features, achieving superior accuracy on image classification benchmarks without large-scale pre-training.
Contribution
We propose the Adaptive Split-Fusion Transformer (ASF-former), a novel hybrid model that adaptively weights convolutional and attention features for improved visual content understanding.
Findings
Outperforms CNN and transformer models on ImageNet-1K with 83.9% accuracy.
Achieves high accuracy with moderate computational cost (12.9G MACs, 56.7M Params).
Effective on multiple benchmarks including CIFAR-10 and CIFAR-100.
Abstract
Neural networks for visual content understanding have recently evolved from convolutional ones (CNNs) to transformers. The prior (CNN) relies on small-windowed kernels to capture the regional clues, demonstrating solid local expressiveness. On the contrary, the latter (transformer) establishes long-range global connections between localities for holistic learning. Inspired by this complementary nature, there is a growing interest in designing hybrid models to best utilize each technique. Current hybrids merely replace convolutions as simple approximations of linear projection or juxtapose a convolution branch with attention, without concerning the importance of local/global modeling. To tackle this, we propose a new hybrid named Adaptive Split-Fusion Transformer (ASF-former) to treat convolutional and attention branches differently with adaptive weights. Specifically, an ASF-former…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Brain Tumor Detection and Classification
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Dense Connections · Softmax · Label Smoothing · Dropout · Adam
