The Potential and Limitations of Vision-Language Models for Human Motion Understanding: A Case Study in Data-Driven Stroke Rehabilitation

Victor Li; Naveenraj Kamalakannan; Avinash Parnandi; Heidi Schambra; Carlos Fernandez-Granda

arXiv:2511.17727·cs.CV·November 25, 2025

The Potential and Limitations of Vision-Language Models for Human Motion Understanding: A Case Study in Data-Driven Stroke Rehabilitation

Victor Li, Naveenraj Kamalakannan, Avinash Parnandi, Heidi Schambra, Carlos Fernandez-Granda

PDF

Open Access

TL;DR

This study assesses the capabilities of vision-language models in stroke rehabilitation, revealing current limitations in fine-grained motion understanding but also highlighting their potential in classifying activities and estimating rehabilitation dose without specialized training.

Contribution

The paper provides a case study applying VLMs to stroke rehab, demonstrating their strengths and weaknesses in quantifying impairment and activity from videos.

Findings

01

VLMs can classify high-level activities from few frames.

02

VLMs detect motion and grasp with moderate accuracy.

03

VLMs estimate dose counts within 25% for mild cases.

Abstract

Vision-language models (VLMs) have demonstrated remarkable performance across a wide range of computer-vision tasks, sparking interest in their potential for digital health applications. Here, we apply VLMs to two fundamental challenges in data-driven stroke rehabilitation: automatic quantification of rehabilitation dose and impairment from videos. We formulate these problems as motion-identification tasks, which can be addressed using VLMs. We evaluate our proposed framework on a cohort of 29 healthy controls and 51 stroke survivors. Our results show that current VLMs lack the fine-grained motion understanding required for precise quantification: dose estimates are comparable to a baseline that excludes visual information, and impairment scores cannot be reliably predicted. Nevertheless, several findings suggest future promise. With optimized prompting and post-processing, VLMs can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStroke Rehabilitation and Recovery · Human Pose and Action Recognition · Balance, Gait, and Falls Prevention