PIDNet: Progressive Implicit Decouple Network for Multimodal Action Quality Assessment
Qiqi Li, Pengfei Wang, and Nenggan Zheng

TL;DR
PIDNet introduces a progressive decoupling and fusion approach for multimodal action quality assessment, effectively integrating modality-specific cues and global semantics for improved accuracy.
Contribution
The paper proposes a novel PIDNet architecture with iMambaWave and Group3M modules for better multimodal feature disentanglement and fusion in action quality assessment.
Findings
PIDNet achieves state-of-the-art correlation scores on Rhythmic Gymnastics and Fis-V datasets.
The iMambaWave module enhances temporal and frequency domain representations.
Ablation studies confirm the effectiveness of each component in PIDNet.
Abstract
Action quality assessment (AQA) aims to automatically quantify the execution quality of human actions in videos and is valuable for applications such as competitive sports judging. In multimodal AQA, quality evidence from different modalities is heterogeneous, and quality cues evolve progressively over time. Existing methods often rely on coarse fusion or unified temporal modeling, which may blur modality-specific cues, preserve cross-modal redundancy, and weaken stage-specific quality evidence. To address these issues, we propose a progressive implicit decoupling and fusion network (PIDNet) that progressively integrates modality-specific information, cross-modal complementary cues, and global quality semantics for accurate assessment. Specifically, we design an iMambaWave module that maps RGB, optical flow, and audio features into a shared latent space and disentangles them with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
