SeaFormer++: Squeeze-enhanced Axial Transformer for Mobile Visual Recognition
Qiang Wan, Zilong Huang, Jiachen Lu, Gang Yu, Li Zhang

TL;DR
SeaFormer++ introduces a squeeze-enhanced axial transformer architecture optimized for mobile visual recognition, achieving superior accuracy and efficiency on various datasets and tasks, including segmentation, classification, and detection.
Contribution
The paper proposes a novel squeeze-enhanced axial transformer backbone that balances performance and computational cost for mobile vision applications.
Findings
Outperforms mobile-friendly rivals and Transformer counterparts in accuracy and latency.
Reduces inference latency via multi-resolution distillation.
Demonstrates versatility across segmentation, classification, and detection tasks.
Abstract
Since the introduction of Vision Transformers, the landscape of many computer vision tasks (e.g., semantic segmentation), which has been overwhelmingly dominated by CNNs, recently has significantly revolutionized. However, the computational cost and memory requirement renders these methods unsuitable on the mobile device. In this paper, we introduce a new method squeeze-enhanced Axial Transformer (SeaFormer) for mobile visual recognition. Specifically, we design a generic attention block characterized by the formulation of squeeze Axial and detail enhancement. It can be further used to create a family of backbone architectures with superior cost-effectiveness. Coupled with a light segmentation head, we achieve the best trade-off between segmentation accuracy and latency on the ARM-based mobile devices on the ADE20K, Cityscapes, Pascal Context and COCO-Stuff datasets. Critically, we beat…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques
