Multimodal Cross-Domain Few-Shot Learning for Egocentric Action   Recognition

Masashi Hatano; Ryo Hachiuma; Ryo Fujii; Hideo Saito

arXiv:2405.19917·cs.CV·July 17, 2024·1 cites

Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition

Masashi Hatano, Ryo Hachiuma, Ryo Fujii, Hideo Saito

PDF

Open Access

TL;DR

This paper introduces MM-CDFSL, a multimodal, domain-adaptive, and computationally efficient method for egocentric action recognition in cross-domain few-shot learning, leveraging multimodal distillation and ensemble masked inference to improve accuracy and speed.

Contribution

The paper proposes a novel multimodal distillation framework with ensemble masked inference for cross-domain few-shot egocentric action recognition, addressing domain gap and computational efficiency.

Findings

01

Outperforms state-of-the-art CD-FSL methods by over 6 points in accuracy.

02

Achieves 2.2 times faster inference speed.

03

Effectively adapts to unlabeled target domain data.

Abstract

We address a novel cross-domain few-shot learning task (CD-FSL) with multimodal input and unlabeled target data for egocentric action recognition. This paper simultaneously tackles two critical challenges associated with egocentric action recognition in CD-FSL settings: (1) the extreme domain gap in egocentric videos (e.g., daily life vs. industrial domain) and (2) the computational cost for real-world applications. We propose MM-CDFSL, a domain-adaptive and computationally efficient approach designed to enhance adaptability to the target domain and improve inference cost. To address the first challenge, we propose the incorporation of multimodal distillation into the student RGB model using teacher models. Each teacher model is trained independently on source and target data for its respective modality. Leveraging only unlabeled target data during multimodal distillation enhances the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Gait Recognition and Analysis