Adaptive Split-Fusion Transformer

Zixuan Su; Hao Zhang; Jingjing Chen; Lei Pang; Chong-Wah Ngo; Yu-Gang; Jiang

arXiv:2204.12196·cs.CV·August 17, 2023·1 cites

Adaptive Split-Fusion Transformer

Zixuan Su, Hao Zhang, Jingjing Chen, Lei Pang, Chong-Wah Ngo, Yu-Gang, Jiang

PDF

Open Access 1 Repo

TL;DR

The paper introduces ASF-former, a hybrid neural network model that adaptively combines convolutional and transformer features, achieving superior accuracy on image classification benchmarks without large-scale pre-training.

Contribution

We propose the Adaptive Split-Fusion Transformer (ASF-former), a novel hybrid model that adaptively weights convolutional and attention features for improved visual content understanding.

Findings

01

Outperforms CNN and transformer models on ImageNet-1K with 83.9% accuracy.

02

Achieves high accuracy with moderate computational cost (12.9G MACs, 56.7M Params).

03

Effective on multiple benchmarks including CIFAR-10 and CIFAR-100.

Abstract

Neural networks for visual content understanding have recently evolved from convolutional ones (CNNs) to transformers. The prior (CNN) relies on small-windowed kernels to capture the regional clues, demonstrating solid local expressiveness. On the contrary, the latter (transformer) establishes long-range global connections between localities for holistic learning. Inspired by this complementary nature, there is a growing interest in designing hybrid models to best utilize each technique. Current hybrids merely replace convolutions as simple approximations of linear projection or juxtapose a convolution branch with attention, without concerning the importance of local/global modeling. To tackle this, we propose a new hybrid named Adaptive Split-Fusion Transformer (ASF-former) to treat convolutional and attention branches differently with adaptive weights. Specifically, an ASF-former…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

szx503045266/asf-former
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Brain Tumor Detection and Classification

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Dense Connections · Softmax · Label Smoothing · Dropout · Adam