Mutual Information Regularization for Weakly-supervised RGB-D Salient Object Detection
Aixuan Li, Yuxin Mao, Jing Zhang, Yuchao Dai

TL;DR
This paper introduces a weakly-supervised RGB-D salient object detection model that leverages mutual information regularization for effective multimodal representation learning and prediction refinement, achieving competitive results with fully supervised methods.
Contribution
It proposes a novel mutual information regularization framework for disentangled multimodal representation learning and a multimodal variational auto-encoder for prediction refinement in weakly-supervised RGB-D salient object detection.
Findings
Achieves comparable performance to fully supervised models on benchmark datasets.
Demonstrates effectiveness of mutual information regularization in multimodal learning.
Validates the proposed stochastic refinement improves detection accuracy.
Abstract
In this paper, we present a weakly-supervised RGB-D salient object detection model via scribble supervision. Specifically, as a multimodal learning task, we focus on effective multimodal representation learning via inter-modal mutual information regularization. In particular, following the principle of disentangled representation learning, we introduce a mutual information upper bound with a mutual information minimization regularizer to encourage the disentangled representation of each modality for salient object detection. Based on our multimodal representation learning framework, we introduce an asymmetric feature extractor for our multimodal data, which is proven more effective than the conventional symmetric backbone setting. We also introduce multimodal variational auto-encoder as stochastic prediction refinement techniques, which takes pseudo labels from the first training stage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Face Recognition and Perception · Gaze Tracking and Assistive Technology
MethodsFocus
