MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based   Decoders

Baijiong Lin; Weisen Jiang; Pengguang Chen; Yu Zhang; Shu Liu; and; Ying-Cong Chen

arXiv:2407.02228·cs.CV·July 16, 2024·1 cites

MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders

Baijiong Lin, Weisen Jiang, Pengguang Chen, Yu Zhang, Shu Liu, and, Ying-Cong Chen

PDF

Open Access 1 Repo

TL;DR

MTMamba introduces a Mamba-based architecture with specialized blocks to improve multi-task dense scene understanding by modeling long-range dependencies and cross-task interactions, leading to superior performance on benchmark datasets.

Contribution

The paper proposes MTMamba, a novel Mamba-based architecture with self-task and cross-task blocks for enhanced multi-task dense scene understanding.

Findings

01

Outperforms Transformer-based and CNN-based methods on NYUDv2 and PASCAL-Context.

02

Achieves +2.08, +5.01, +4.90 improvements in semantic segmentation, human parsing, and object boundary detection.

03

Demonstrates effective modeling of long-range dependencies and task interactions.

Abstract

Multi-task dense scene understanding, which learns a model for multiple dense prediction tasks, has a wide range of application scenarios. Modeling long-range dependency and enhancing cross-task interactions are crucial to multi-task dense prediction. In this paper, we propose MTMamba, a novel Mamba-based architecture for multi-task scene understanding. It contains two types of core blocks: self-task Mamba (STM) block and cross-task Mamba (CTM) block. STM handles long-range dependency by leveraging Mamba, while CTM explicitly models task interactions to facilitate information exchange across tasks. Experiments on NYUDv2 and PASCAL-Context datasets demonstrate the superior performance of MTMamba over Transformer-based and CNN-based methods. Notably, on the PASCAL-Context dataset, MTMamba achieves improvements of +2.08, +5.01, and +4.90 over the previous best methods in the tasks of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

envision-research/mtmamba
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Advanced Vision and Imaging