Mutual Information Regularization for Weakly-supervised RGB-D Salient   Object Detection

Aixuan Li; Yuxin Mao; Jing Zhang; Yuchao Dai

arXiv:2306.03630·cs.CV·June 7, 2023·2 cites

Mutual Information Regularization for Weakly-supervised RGB-D Salient Object Detection

Aixuan Li, Yuxin Mao, Jing Zhang, Yuchao Dai

PDF

Open Access 1 Repo

TL;DR

This paper introduces a weakly-supervised RGB-D salient object detection model that leverages mutual information regularization for effective multimodal representation learning and prediction refinement, achieving competitive results with fully supervised methods.

Contribution

It proposes a novel mutual information regularization framework for disentangled multimodal representation learning and a multimodal variational auto-encoder for prediction refinement in weakly-supervised RGB-D salient object detection.

Findings

01

Achieves comparable performance to fully supervised models on benchmark datasets.

02

Demonstrates effectiveness of mutual information regularization in multimodal learning.

03

Validates the proposed stochastic refinement improves detection accuracy.

Abstract

In this paper, we present a weakly-supervised RGB-D salient object detection model via scribble supervision. Specifically, as a multimodal learning task, we focus on effective multimodal representation learning via inter-modal mutual information regularization. In particular, following the principle of disentangled representation learning, we introduce a mutual information upper bound with a mutual information minimization regularizer to encourage the disentangled representation of each modality for salient object detection. Based on our multimodal representation learning framework, we introduce an asymmetric feature extractor for our multimodal data, which is proven more effective than the conventional symmetric backbone setting. We also introduce multimodal variational auto-encoder as stochastic prediction refinement techniques, which takes pseudo labels from the first training stage…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

baneitixiaomai/mirv
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Face Recognition and Perception · Gaze Tracking and Assistive Technology

MethodsFocus