Exploring Missing Modality in Multimodal Egocentric Datasets

Merey Ramazanova; Alejandro Pardo; Humam Alwassel; Bernard; Ghanem

arXiv:2401.11470·cs.CV·April 18, 2024·2 cites

Exploring Missing Modality in Multimodal Egocentric Datasets

Merey Ramazanova, Alejandro Pardo, Humam Alwassel, Bernard, Ghanem

PDF

Open Access

TL;DR

This paper introduces a Missing Modality Token (MMT) approach to improve multimodal egocentric video understanding, maintaining high performance despite incomplete sensory data in various datasets.

Contribution

The study proposes the MMT method, a novel strategy that enhances transformer-based models to handle missing modalities effectively in egocentric video analysis.

Findings

01

Reduces performance drop from ~30% to ~10% with missing modalities

02

Demonstrates MMT's effectiveness across multiple datasets

03

Shows MMT's adaptability to different training scenarios

Abstract

Multimodal video understanding is crucial for analyzing egocentric videos, where integrating multiple sensory signals significantly enhances action recognition and moment localization. However, practical applications often grapple with incomplete modalities due to factors like privacy concerns, efficiency demands, or hardware malfunctions. Addressing this, our study delves into the impact of missing modalities on egocentric action recognition, particularly within transformer-based models. We introduce a novel concept -Missing Modality Token (MMT)-to maintain performance even when modalities are absent, a strategy that proves effective in the Ego4D, Epic-Kitchens, and Epic-Sounds datasets. Our method mitigates the performance loss, reducing it from its original $\sim 30%$ drop to only $\sim 10%$ when half of the test set is modal-incomplete. Through extensive experimentation, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Advanced Neural Network Applications

MethodsSparse Evolutionary Training