MU-MAE: Multimodal Masked Autoencoders-Based One-Shot Learning

Rex Liu; Xin Liu

arXiv:2408.04243·cs.CV·August 9, 2024

MU-MAE: Multimodal Masked Autoencoders-Based One-Shot Learning

Rex Liu, Xin Liu

PDF

Open Access

TL;DR

Mu-MAE introduces a self-supervised multimodal autoencoder with a novel masking strategy, enabling effective one-shot human activity recognition from video and sensor data without external datasets.

Contribution

The paper proposes Mu-MAE, a multimodal masked autoencoder with synchronized masking for self-supervised pretraining and a cross-attention fusion layer for improved one-shot classification.

Findings

01

Achieves up to 80.17% accuracy on MMAct one-shot classification

02

Outperforms existing approaches without external data

03

Effective spatiotemporal feature learning through novel masking strategy

Abstract

With the exponential growth of multimedia data, leveraging multimodal sensors presents a promising approach for improving accuracy in human activity recognition. Nevertheless, accurately identifying these activities using both video data and wearable sensor data presents challenges due to the labor-intensive data annotation, and reliance on external pretrained models or additional data. To address these challenges, we introduce Multimodal Masked Autoencoders-Based One-Shot Learning (Mu-MAE). Mu-MAE integrates a multimodal masked autoencoder with a synchronized masking strategy tailored for wearable sensors. This masking strategy compels the networks to capture more meaningful spatiotemporal features, which enables effective self-supervised pretraining without the need for external data. Furthermore, Mu-MAE leverages the representation extracted from multimodal masked autoencoders as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning

MethodsSoftmax · Attention Is All You Need