Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model
Haobo Yuan, Xiangtai Li, Lu Qi, Tao Zhang, Ming-Hsuan Yang, and Shuicheng Yan, Chen Change Loy

TL;DR
This paper introduces RWKV-SAM, a fast and efficient segmentation model combining convolution and RWKV architectures, achieving superior accuracy and speed compared to transformer-based models on multiple datasets.
Contribution
The paper designs a novel mixed backbone with convolution and RWKV, and a new efficient decoder, establishing a high-performance, efficient segmentation baseline.
Findings
RWKV-SAM achieves over 2x speedup compared to transformer models.
It outperforms recent Mamba models in segmentation accuracy.
The benchmark demonstrates superior efficiency and quality across datasets.
Abstract
Transformer-based segmentation methods face the challenge of efficient inference when dealing with high-resolution images. Recently, several linear attention architectures, such as Mamba and RWKV, have attracted much attention as they can process long sequences efficiently. In this work, we focus on designing an efficient segment-anything model by exploring these different architectures. Specifically, we design a mixed backbone that contains convolution and RWKV operation, which achieves the best for both accuracy and efficiency. In addition, we design an efficient decoder to utilize the multiscale tokens to obtain high-quality masks. We denote our method as RWKV-SAM, a simple, effective, fast baseline for SAM-like models. Moreover, we build a benchmark containing various high-quality segmentation datasets and jointly train one efficient yet high-quality segmentation model using this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · Customer churn and segmentation
MethodsSoftmax · Attention Is All You Need · Focus · Convolution
