Understanding Dark Scenes by Contrasting Multi-Modal Observations
Xiaoyu Dong, Naoto Yokoya

TL;DR
This paper proposes a supervised multi-modal contrastive learning method to improve dark scene understanding by enhancing semantic discriminability in multi-modal feature spaces, outperforming previous approaches.
Contribution
It introduces a novel contrastive learning framework that jointly performs cross-modal and intra-modal contrast to better utilize multi-modal data for dark scene analysis.
Findings
Achieves state-of-the-art performance on dark scene understanding tasks.
Effectively enhances semantic discriminability in multi-modal feature spaces.
Demonstrates robustness across diverse lighting conditions and modalities.
Abstract
Understanding dark scenes based on multi-modal image data is challenging, as both the visible and auxiliary modalities provide limited semantic information for the task. Previous methods focus on fusing the two modalities but neglect the correlations among semantic classes when minimizing losses to align pixels with labels, resulting in inaccurate class predictions. To address these issues, we introduce a supervised multi-modal contrastive learning approach to increase the semantic discriminability of the learned multi-modal feature spaces by jointly performing cross-modal and intra-modal contrast under the supervision of the class correlations. The cross-modal contrast encourages same-class embeddings from across the two modalities to be closer and pushes different-class ones apart. The intra-modal contrast forces same-class or different-class embeddings within each modality to be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Understanding Dark Scenes by Contrasting Multi-Modal Observations· youtube
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques · Generative Adversarial Networks and Image Synthesis
MethodsContrastive Learning · ALIGN · Focus
