MCMoE: Completing Missing Modalities with Mixture of Experts for Incomplete Multimodal Action Quality Assessment

Huangbiao Xu; Huanqi Wu; Xiao Ke; Junyi Wu; Rui Xu; Jinglin Xu

arXiv:2511.17397·cs.CV·December 9, 2025

MCMoE: Completing Missing Modalities with Mixture of Experts for Incomplete Multimodal Action Quality Assessment

Huangbiao Xu, Huanqi Wu, Xiao Ke, Junyi Wu, Rui Xu, Jinglin Xu

PDF

Open Access

TL;DR

This paper introduces MCMoE, a framework that effectively reconstructs missing modalities in multimodal action quality assessment, enabling robust evaluation even with incomplete data by using a mixture of experts for dynamic modality fusion.

Contribution

The paper proposes a novel Mixture of Experts framework with an adaptive gated modality generator for reconstructing missing modalities in multimodal AQA tasks.

Findings

01

Achieves state-of-the-art results on three public benchmarks.

02

Effectively handles incomplete multimodal data during inference.

03

Demonstrates robustness in both complete and incomplete modality scenarios.

Abstract

Multimodal Action Quality Assessment (AQA) has recently emerged as a promising paradigm. By leveraging complementary information across shared contextual cues, it enhances the discriminative evaluation of subtle intra-class variations in highly similar action sequences. However, partial modalities are frequently unavailable at the inference stage in reality. The absence of any modality often renders existing multimodal models inoperable. Furthermore, it triggers catastrophic performance degradation due to interruptions in cross-modal interactions. To address this issue, we propose a novel Missing Completion Framework with Mixture of Experts (MCMoE) that unifies unimodal and joint representation learning in single-stage training. Specifically, we propose an adaptive gated modality generator that dynamically fuses available information to reconstruct missing modalities. We then design…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Emotion and Mood Recognition