Align and Adapt: Multimodal Multiview Human Activity Recognition under Arbitrary View Combinations

Duc-Anh Nguyen; Nhien-An Le-Khac

arXiv:2602.08755·cs.LG·February 19, 2026

Align and Adapt: Multimodal Multiview Human Activity Recognition under Arbitrary View Combinations

Duc-Anh Nguyen, Nhien-An Le-Khac

PDF

Open Access

TL;DR

The paper introduces AliAd, a flexible multimodal multiview human activity recognition model that effectively handles arbitrary view combinations using contrastive learning and a mixture-of-experts approach, improving performance and efficiency.

Contribution

AliAd is the first model to support arbitrary view configurations in multiview human activity recognition through a novel contrastive loss and mixture-of-experts module.

Findings

01

Achieves high accuracy on four diverse datasets.

02

Supports arbitrary view combinations with reduced computational complexity.

03

Effectively handles missing and heterogeneous views.

Abstract

Multimodal multiview learning seeks to integrate information from diverse sources to enhance task performance. Existing approaches often struggle with flexible view configurations, including arbitrary view combinations, numbers of views, and heterogeneous modalities. Focusing on the context of human activity recognition, we propose AliAd, a model that combines multiview contrastive learning with a mixture-of-experts module to support arbitrary view availability during both training and inference. Instead of trying to reconstruct missing views, an adjusted center contrastive loss is used for self-supervised representation learning and view alignment, mitigating the impact of missing views on multiview fusion. This loss formulation allows for the integration of view weights to account for view quality. Additionally, it reduces computational complexity from $O (V^{2})$ to $O (V)$ , where $V$ is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Context-Aware Activity Recognition Systems · Domain Adaptation and Few-Shot Learning