MCMoE: Completing Missing Modalities with Mixture of Experts for Incomplete Multimodal Action Quality Assessment
Huangbiao Xu, Huanqi Wu, Xiao Ke, Junyi Wu, Rui Xu, Jinglin Xu

TL;DR
This paper introduces MCMoE, a framework that effectively reconstructs missing modalities in multimodal action quality assessment, enabling robust evaluation even with incomplete data by using a mixture of experts for dynamic modality fusion.
Contribution
The paper proposes a novel Mixture of Experts framework with an adaptive gated modality generator for reconstructing missing modalities in multimodal AQA tasks.
Findings
Achieves state-of-the-art results on three public benchmarks.
Effectively handles incomplete multimodal data during inference.
Demonstrates robustness in both complete and incomplete modality scenarios.
Abstract
Multimodal Action Quality Assessment (AQA) has recently emerged as a promising paradigm. By leveraging complementary information across shared contextual cues, it enhances the discriminative evaluation of subtle intra-class variations in highly similar action sequences. However, partial modalities are frequently unavailable at the inference stage in reality. The absence of any modality often renders existing multimodal models inoperable. Furthermore, it triggers catastrophic performance degradation due to interruptions in cross-modal interactions. To address this issue, we propose a novel Missing Completion Framework with Mixture of Experts (MCMoE) that unifies unimodal and joint representation learning in single-stage training. Specifically, we propose an adaptive gated modality generator that dynamically fuses available information to reconstruct missing modalities. We then design…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Emotion and Mood Recognition
