TL;DR
This paper introduces CroDiNo-KD, a novel framework for RGBD semantic segmentation that uses disentanglement and contrastive learning to improve single-modality models, addressing limitations of traditional cross-modal knowledge distillation.
Contribution
The paper proposes CroDiNo-KD, a new approach that learns single-modality models from multi-modal data using disentanglement, contrastive learning, and decoupled data augmentation.
Findings
CroDiNo-KD outperforms recent CMKD frameworks on three RGBD datasets.
It demonstrates the effectiveness of disentanglement in cross-modal knowledge transfer.
The approach suggests a new perspective on distilling multi-modal information into single-modality models.
Abstract
Multi-modal RGB and Depth (RGBD) data are predominant in many domains such as robotics, autonomous driving and remote sensing. The combination of these multi-modal data enhances environmental perception by providing 3D spatial context, which is absent in standard RGB images. Although RGBD multi-modal data can be available to train computer vision models, accessing all sensor modalities during the inference stage may be infeasible due to sensor failures or resource constraints, leading to a mismatch between data modalities available during training and inference. Traditional Cross-Modal Knowledge Distillation (CMKD) frameworks, developed to address this task, are typically based on a teacher/student paradigm, where a multi-modal teacher distills knowledge into a single-modality student model. However, these approaches face challenges in teacher architecture choices and distillation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsContrastive Learning · Knowledge Distillation
