SeaFormer++: Squeeze-enhanced Axial Transformer for Mobile Visual   Recognition

Qiang Wan; Zilong Huang; Jiachen Lu; Gang Yu; Li Zhang

arXiv:2301.13156·cs.CV·February 10, 2025·82 cites

SeaFormer++: Squeeze-enhanced Axial Transformer for Mobile Visual Recognition

Qiang Wan, Zilong Huang, Jiachen Lu, Gang Yu, Li Zhang

PDF

Open Access 1 Repo

TL;DR

SeaFormer++ introduces a squeeze-enhanced axial transformer architecture optimized for mobile visual recognition, achieving superior accuracy and efficiency on various datasets and tasks, including segmentation, classification, and detection.

Contribution

The paper proposes a novel squeeze-enhanced axial transformer backbone that balances performance and computational cost for mobile vision applications.

Findings

01

Outperforms mobile-friendly rivals and Transformer counterparts in accuracy and latency.

02

Reduces inference latency via multi-resolution distillation.

03

Demonstrates versatility across segmentation, classification, and detection tasks.

Abstract

Since the introduction of Vision Transformers, the landscape of many computer vision tasks (e.g., semantic segmentation), which has been overwhelmingly dominated by CNNs, recently has significantly revolutionized. However, the computational cost and memory requirement renders these methods unsuitable on the mobile device. In this paper, we introduce a new method squeeze-enhanced Axial Transformer (SeaFormer) for mobile visual recognition. Specifically, we design a generic attention block characterized by the formulation of squeeze Axial and detail enhancement. It can be further used to create a family of backbone architectures with superior cost-effectiveness. Coupled with a light segmentation head, we achieve the best trade-off between segmentation accuracy and latency on the ARM-based mobile devices on the ADE20K, Cityscapes, Pascal Context and COCO-Stuff datasets. Critically, we beat…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fudan-zvg/seaformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques