Deep Mamba Multi-modal Learning
Jian Zhu, Xin Zou, Yu Cui, Zhangmin Huang, Chenshu Hu, Bo Lyu

TL;DR
This paper introduces Deep Mamba Multi-modal Learning (DMML), a novel approach for multi-modal feature fusion, and proposes Deep Mamba Multi-modal Hashing (DMMH) for multimedia retrieval, achieving state-of-the-art results.
Contribution
The paper presents a new deep learning framework inspired by Mamba networks for multi-modal fusion and introduces DMMH, combining accuracy and speed for multimedia retrieval.
Findings
DMMH achieves state-of-the-art performance on three datasets.
DMML effectively fuses multi-modal features.
DMMH balances accuracy and inference speed.
Abstract
Inspired by the excellent performance of Mamba networks, we propose a novel Deep Mamba Multi-modal Learning (DMML). It can be used to achieve the fusion of multi-modal features. We apply DMML to the field of multimedia retrieval and propose an innovative Deep Mamba Multi-modal Hashing (DMMH) method. It combines the advantages of algorithm accuracy and inference speed. We validated the effectiveness of DMMH on three public datasets and achieved state-of-the-art results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
