Understanding Dark Scenes by Contrasting Multi-Modal Observations

Xiaoyu Dong; Naoto Yokoya

arXiv:2308.12320·cs.CV·November 22, 2023·1 cites

Understanding Dark Scenes by Contrasting Multi-Modal Observations

Xiaoyu Dong, Naoto Yokoya

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper proposes a supervised multi-modal contrastive learning method to improve dark scene understanding by enhancing semantic discriminability in multi-modal feature spaces, outperforming previous approaches.

Contribution

It introduces a novel contrastive learning framework that jointly performs cross-modal and intra-modal contrast to better utilize multi-modal data for dark scene analysis.

Findings

01

Achieves state-of-the-art performance on dark scene understanding tasks.

02

Effectively enhances semantic discriminability in multi-modal feature spaces.

03

Demonstrates robustness across diverse lighting conditions and modalities.

Abstract

Understanding dark scenes based on multi-modal image data is challenging, as both the visible and auxiliary modalities provide limited semantic information for the task. Previous methods focus on fusing the two modalities but neglect the correlations among semantic classes when minimizing losses to align pixels with labels, resulting in inaccurate class predictions. To address these issues, we introduce a supervised multi-modal contrastive learning approach to increase the semantic discriminability of the learned multi-modal feature spaces by jointly performing cross-modal and intra-modal contrast under the supervision of the class correlations. The cross-modal contrast encourages same-class embeddings from across the two modalities to be closer and pushes different-class ones apart. The intra-modal contrast forces same-class or different-class embeddings within each modality to be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

palmdong/smmcl
pytorchOfficial

Videos

Understanding Dark Scenes by Contrasting Multi-Modal Observations· youtube

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques · Generative Adversarial Networks and Image Synthesis

MethodsContrastive Learning · ALIGN · Focus