Famba-V: Fast Vision Mamba with Cross-Layer Token Fusion

Hui Shen; Zhongwei Wan; Xin Wang; Mi Zhang

arXiv:2409.09808·cs.CV·October 8, 2024

Famba-V: Fast Vision Mamba with Cross-Layer Token Fusion

Hui Shen, Zhongwei Wan, Xin Wang, Mi Zhang

PDF

Open Access 1 Repo

TL;DR

Famba-V introduces a cross-layer token fusion method that improves training efficiency and accuracy of Vision Mamba models by reducing training time and memory usage through strategic token fusion across layers.

Contribution

The paper presents Famba-V, a novel cross-layer token fusion technique that enhances training efficiency and accuracy of Vim models, differing from uniform token fusion approaches.

Findings

01

Reduces training time and peak memory usage

02

Achieves better accuracy-efficiency trade-offs

03

Demonstrates effectiveness on CIFAR-100

Abstract

Mamba and Vision Mamba (Vim) models have shown their potential as an alternative to methods based on Transformer architecture. This work introduces Fast Mamba for Vision (Famba-V), a cross-layer token fusion technique to enhance the training efficiency of Vim models. The key idea of Famba-V is to identify and fuse similar tokens across different Vim layers based on a suit of cross-layer strategies instead of simply applying token fusion uniformly across all the layers that existing works propose. We evaluate the performance of Famba-V on CIFAR-100. Our results show that Famba-V is able to enhance the training efficiency of Vim models by reducing both training time and peak memory usage during training. Moreover, the proposed cross-layer strategies allow Famba-V to deliver superior accuracy-efficiency trade-offs. These results all together demonstrate Famba-V as a promising efficiency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aiot-mlsys-lab/famba-v
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Image and Object Detection Techniques

MethodsAttention Is All You Need · Byte Pair Encoding · Absolute Position Encodings · Softmax · Mamba: Linear-Time Sequence Modeling with Selective State Spaces · Label Smoothing · Layer Normalization · Dropout · Position-Wise Feed-Forward Layer · Residual Connection