Calibrating Class Weights with Multi-Modal Information for Partial Video Domain Adaptation
Xiyu Wang, Yuecong Xu, Kezhi Mao, Jianfei Yang

TL;DR
This paper introduces MCAN, a novel method for partial video domain adaptation that leverages multi-modal features and a class weight calibration technique to improve cross-domain video classification accuracy.
Contribution
The paper proposes MCAN, which enhances feature extraction with multi-modal data and introduces a class weight calibration method to reduce negative transfer in PVDA.
Findings
MCAN outperforms state-of-the-art PVDA methods on benchmark datasets.
Multi-modal features improve robustness against domain shifts.
Calibration effectively reduces negative transfer caused by incorrect class weights.
Abstract
Assuming the source label space subsumes the target one, Partial Video Domain Adaptation (PVDA) is a more general and practical scenario for cross-domain video classification problems. The key challenge of PVDA is to mitigate the negative transfer caused by the source-only outlier classes. To tackle this challenge, a crucial step is to aggregate target predictions to assign class weights by up-weighing target classes and down-weighing outlier classes. However, the incorrect predictions of class weights can mislead the network and lead to negative transfer. Previous works improve the class weight accuracy by utilizing temporal features and attention mechanisms, but these methods may fall short when trying to generate accurate class weight when domain shifts are significant, as in most real-world scenarios. To deal with these challenges, we propose the Multi-modality Cluster-calibrated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
