From 3D Pose to Prose: Biomechanics-Grounded Vision--Language Coaching

Yuyang Ji; Yixuan Shen; Shengjie Zhu; Yu Kong; Feng Liu

arXiv:2603.26938·cs.CV·March 31, 2026

From 3D Pose to Prose: Biomechanics-Grounded Vision--Language Coaching

Yuyang Ji, Yixuan Shen, Shengjie Zhu, Yu Kong, Feng Liu

PDF

TL;DR

BioCoach is a biomechanics-grounded vision-language framework for personalized fitness coaching from streaming video, integrating skeletal kinematics and biomechanical context for accurate, transparent feedback.

Contribution

It introduces a novel three-stage pipeline combining joint selection, biomechanical context, and cross-attention feedback, with parameter-efficient training and new evaluation metrics.

Findings

01

BioCoach improves text quality and correctness in fitness coaching.

02

It maintains temporal triggering while enhancing coaching accuracy.

03

The framework demonstrates the importance of explicit kinematics and biomechanical constraints.

Abstract

We present BioCoach, a biomechanics-grounded vision--language framework for fitness coaching from streaming video. BioCoach fuses visual appearance and 3D skeletal kinematics, through a novel three-stage pipeline: an exercise-specific degree-of-freedom selector that focuses analysis on salient joints; a structured biomechanical context that pairs individualized morphometrics with cycle and constraint analysis; and a vision--biomechanics conditioned feedback module that applies cross-attention to generate precise, actionable text. Using parameter-efficient training that freezes the vision and language backbones, BioCoach yields transparent, personalized reasoning rather than pattern matching. To enable learning and fair evaluation, we augment QEVD-fit-coach with biomechanics-oriented feedback to create QEVD-bio-fit-coach, and we introduce a biomechanics-aware LLM judge metric. BioCoach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.