Multivariate Gaussian Representation Learning for Medical Action Evaluation
Luming Yang, Haoxian Liu, Siqing Li, Alper Yilmaz

TL;DR
This paper introduces a new benchmark dataset and a Gaussian-based framework for fine-grained medical action evaluation, improving accuracy and robustness in rapid, complex motions.
Contribution
The paper presents CPREval-6k, a comprehensive medical action dataset, and GaussMedAct, a novel Gaussian encoding method for enhanced spatiotemporal modeling in medical motion analysis.
Findings
Achieves 92.1% Top-1 accuracy with real-time inference.
Outperforms baseline by +5.9% accuracy with reduced FLOPs.
Demonstrates robustness across multiple datasets.
Abstract
Fine-grained action evaluation in medical vision faces unique challenges due to the unavailability of comprehensive datasets, stringent precision requirements, and insufficient spatiotemporal dynamic modeling of very rapid actions. To support development and evaluation, we introduce CPREval-6k, a multi-view, multi-label medical action benchmark containing 6,372 expert-annotated videos with 22 clinical labels. Using this dataset, we present GaussMedAct, a multivariate Gaussian encoding framework, to advance medical motion analysis through adaptive spatiotemporal representation learning. Multivariate Gaussian Representation projects the joint motions to a temporally scaled multi-dimensional space, and decomposes actions into adaptive 3D Gaussians that serve as tokens. These tokens preserve motion semantics through anisotropic covariance modeling while maintaining robustness to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Balance, Gait, and Falls Prevention · Prosthetics and Rehabilitation Robotics
