From 3D Pose to Prose: Biomechanics-Grounded Vision--Language Coaching
Yuyang Ji, Yixuan Shen, Shengjie Zhu, Yu Kong, Feng Liu

TL;DR
BioCoach is a biomechanics-grounded vision-language framework for personalized fitness coaching from streaming video, integrating skeletal kinematics and biomechanical context for accurate, transparent feedback.
Contribution
It introduces a novel three-stage pipeline combining joint selection, biomechanical context, and cross-attention feedback, with parameter-efficient training and new evaluation metrics.
Findings
BioCoach improves text quality and correctness in fitness coaching.
It maintains temporal triggering while enhancing coaching accuracy.
The framework demonstrates the importance of explicit kinematics and biomechanical constraints.
Abstract
We present BioCoach, a biomechanics-grounded vision--language framework for fitness coaching from streaming video. BioCoach fuses visual appearance and 3D skeletal kinematics, through a novel three-stage pipeline: an exercise-specific degree-of-freedom selector that focuses analysis on salient joints; a structured biomechanical context that pairs individualized morphometrics with cycle and constraint analysis; and a vision--biomechanics conditioned feedback module that applies cross-attention to generate precise, actionable text. Using parameter-efficient training that freezes the vision and language backbones, BioCoach yields transparent, personalized reasoning rather than pattern matching. To enable learning and fair evaluation, we augment QEVD-fit-coach with biomechanics-oriented feedback to create QEVD-bio-fit-coach, and we introduce a biomechanics-aware LLM judge metric. BioCoach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
