MMG-Ego4D: Multi-Modal Generalization in Egocentric Action Recognition

Xinyu Gong; Sreyas Mohan; Naina Dhingra; Jean-Charles Bazin; Yilei Li,; Zhangyang Wang; Rakesh Ranjan

arXiv:2305.07214·cs.CV·May 15, 2023·1 cites

MMG-Ego4D: Multi-Modal Generalization in Egocentric Action Recognition

Xinyu Gong, Sreyas Mohan, Naina Dhingra, Jean-Charles Bazin, Yilei Li,, Zhangyang Wang, Rakesh Ranjan

PDF

Open Access

TL;DR

This paper introduces the MMG-Ego4D dataset and methods to study how egocentric action recognition systems can generalize across missing or disjoint modalities, advancing multimodal generalization research.

Contribution

The paper presents a new dataset, MMG-Ego4D, and novel methods for multimodal generalization in egocentric action recognition, addressing missing and disjoint modalities.

Findings

01

Proposed a new fusion module with modality dropout.

02

Achieved improved few-shot generalization performance.

03

Established a benchmark for multimodal generalization in egocentric videos.

Abstract

In this paper, we study a novel problem in egocentric action recognition, which we term as "Multimodal Generalization" (MMG). MMG aims to study how systems can generalize when data from certain modalities is limited or even completely missing. We thoroughly investigate MMG in the context of standard supervised action recognition and the more challenging few-shot setting for learning new action categories. MMG consists of two novel scenarios, designed to support security, and efficiency considerations in real-world applications: (1) missing modality generalization where some modalities that were present during the train time are missing during the inference time, and (2) cross-modal zero-shot generalization, where the modalities present during the inference time and the training time are disjoint. To enable this investigation, we construct a new dataset MMG-Ego4D containing data points…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Multimodal Machine Learning Applications

MethodsDropout