ExpertAF: Expert Actionable Feedback from Video
Kumar Ashutosh, Tushar Nagarajan, Georgios Pavlakos, Kris Kitani,, Kristen Grauman

TL;DR
ExpertAF is a novel multimodal method that generates detailed, actionable coaching feedback from videos of physical activities, combining expert commentary and visual demonstrations to improve skill learning.
Contribution
It introduces a weakly-supervised training approach leveraging existing datasets and a multimodal model to produce comprehensive coaching feedback from video and pose data.
Findings
Outperforms strong vision-language models on established metrics
Generates expert commentary and visual corrections effectively
Receives higher human preference scores
Abstract
Feedback is essential for learning a new skill or improving one's current skill-level. However, current methods for skill-assessment from video only provide scores or compare demonstrations, leaving the burden of knowing what to do differently on the user. We introduce a novel method to generate actionable feedback (AF) from video of a person doing a physical activity, such as basketball or soccer. Our method takes a video demonstration and its accompanying 3D body pose and generates (1) free-form expert commentary describing what the person is doing well and what they could improve, and (2) a visual expert demonstration that incorporates the required corrections. We show how to leverage Ego-Exo4D's [29] videos of skilled activity and expert commentary together with a strong language model to create a weakly-supervised training dataset for this task, and we devise a multimodal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI-based Problem Solving and Planning
