Align and Adapt: Multimodal Multiview Human Activity Recognition under Arbitrary View Combinations
Duc-Anh Nguyen, Nhien-An Le-Khac

TL;DR
The paper introduces AliAd, a flexible multimodal multiview human activity recognition model that effectively handles arbitrary view combinations using contrastive learning and a mixture-of-experts approach, improving performance and efficiency.
Contribution
AliAd is the first model to support arbitrary view configurations in multiview human activity recognition through a novel contrastive loss and mixture-of-experts module.
Findings
Achieves high accuracy on four diverse datasets.
Supports arbitrary view combinations with reduced computational complexity.
Effectively handles missing and heterogeneous views.
Abstract
Multimodal multiview learning seeks to integrate information from diverse sources to enhance task performance. Existing approaches often struggle with flexible view configurations, including arbitrary view combinations, numbers of views, and heterogeneous modalities. Focusing on the context of human activity recognition, we propose AliAd, a model that combines multiview contrastive learning with a mixture-of-experts module to support arbitrary view availability during both training and inference. Instead of trying to reconstruct missing views, an adjusted center contrastive loss is used for self-supervised representation learning and view alignment, mitigating the impact of missing views on multiview fusion. This loss formulation allows for the integration of view weights to account for view quality. Additionally, it reduces computational complexity from to , where is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Context-Aware Activity Recognition Systems · Domain Adaptation and Few-Shot Learning
