FixMyPose: Pose Correctional Captioning and Retrieval
Hyounghun Kim, Abhay Zala, Graham Burri, Mohit Bansal

TL;DR
FixMyPose introduces a new dataset and tasks for automated pose correction captioning and retrieval, addressing the need for scalable, personalized feedback in physical exercises through diverse, multilingual data and strong baseline models.
Contribution
The paper presents the FixMyPose dataset, defines pose correction captioning and retrieval tasks, and develops baseline models with new evaluation metrics for pose correction in diverse, multilingual settings.
Findings
Baseline models perform competitively on image-difference datasets.
New task-specific metrics provide reliable evaluation.
Promising transferability of models to real-world images.
Abstract
Interest in physical therapy and individual exercises such as yoga/dance has increased alongside the well-being trend. However, such exercises are hard to follow without expert guidance (which is impossible to scale for personalized feedback to every trainee remotely). Thus, automated pose correction systems are required more than ever, and we introduce a new captioning dataset named FixMyPose to address this need. We collect descriptions of correcting a "current" pose to look like a "target" pose (in both English and Hindi). The collected descriptions have interesting linguistic properties such as egocentric relations to environment objects, analogous references, etc., requiring an understanding of spatial relations and commonsense knowledge about postures. Further, to avoid ML biases, we maintain a balance across characters with diverse demographics, who perform a variety of movements…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Hand Gesture Recognition Systems
