Dissecting RGB-D Learning for Improved Multi-modal Fusion
Hao Chen, Haoran Zhou, Yunshu Zhang, Zheng Lin, Yongjian Deng

TL;DR
This paper introduces an analytical framework to dissect RGB-D multi-modal learning, revealing feature discrepancies and cooperation rules, and proposes a simple fusion strategy that improves performance across tasks.
Contribution
The paper presents a novel dissection method for RGB-D models, providing insights into feature interactions and proposing an effective fusion strategy based on these insights.
Findings
Discrepancy in cross-modal features identified
Hybrid cooperation rule enhances inference
Proposed fusion strategy improves multiple tasks
Abstract
In the RGB-D vision community, extensive research has been focused on designing multi-modal learning strategies and fusion structures. However, the complementary and fusion mechanisms in RGB-D models remain a black box. In this paper, we present an analytical framework and a novel score to dissect the RGB-D vision community. Our approach involves measuring proposed semantic variance and feature similarity across modalities and levels, conducting visual and quantitative analyzes on multi-modal learning through comprehensive experiments. Specifically, we investigate the consistency and specialty of features across modalities, evolution rules within each modality, and the collaboration logic used when optimizing a RGB-D model. Our studies reveal/verify several important findings, such as the discrepancy in cross-modal features and the hybrid multi-modal cooperation rule, which highlights…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Visual Attention and Saliency Detection
