MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders

Baijiong Lin; Weisen Jiang; Pengguang Chen; Shu Liu; and Ying-Cong Chen

arXiv:2408.15101·cs.CV·July 29, 2025

MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders

Baijiong Lin, Weisen Jiang, Pengguang Chen, Shu Liu, and Ying-Cong Chen

PDF

Open Access 1 Repo

TL;DR

MTMamba++ introduces a novel multi-task dense scene understanding architecture with Mamba-based decoders, effectively capturing long-range dependencies and cross-task interactions, leading to superior performance across multiple datasets.

Contribution

The paper presents MTMamba++, a new architecture with Mamba-based decoders that explicitly model long-range dependencies and cross-task interactions for improved multi-task scene understanding.

Findings

01

Outperforms CNN, Transformer, and diffusion-based methods on NYUDv2, PASCAL-Context, and Cityscapes.

02

Effectively models long-range dependencies using state-space models.

03

Maintains high computational efficiency while achieving superior accuracy.

Abstract

Multi-task dense scene understanding, which trains a model for multiple dense prediction tasks, has a wide range of application scenarios. Capturing long-range dependency and enhancing cross-task interactions are crucial to multi-task dense prediction. In this paper, we propose MTMamba++, a novel architecture for multi-task scene understanding featuring with a Mamba-based decoder. It contains two types of core blocks: self-task Mamba (STM) block and cross-task Mamba (CTM) block. STM handles long-range dependency by leveraging state-space models, while CTM explicitly models task interactions to facilitate information exchange across tasks. We design two types of CTM block, namely F-CTM and S-CTM, to enhance cross-task interaction from feature and semantic perspectives, respectively. Extensive experiments on NYUDv2, PASCAL-Context, and Cityscapes datasets demonstrate the superior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

envision-research/mtmamba
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Advanced Vision and Imaging

MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces