HMR-1: Hierarchical Massage Robot with Vision-Language-Model for Embodied Healthcare
Rongtao Xu, Mingming Yu, Xiaofeng Han, Yu Zhang, Kaiyi Hu, Zhe Feng, Zenghuang Fu, Changwei Wang, Weiliang Meng, Xiaopeng Zhang

TL;DR
This paper introduces a hierarchical embodied massage framework utilizing vision-language models, along with a large multimodal dataset, to advance healthcare robotics and establish evaluation benchmarks.
Contribution
It presents a novel hierarchical massage framework with a new multimodal dataset and benchmarks for embodied healthcare tasks, enhancing robustness and standardization.
Findings
The dataset contains over 12,000 images and 174,000 QA pairs.
Fine-tuning Qwen-VL improves acupoint grounding accuracy.
Physical experiments validate the framework's practical effectiveness.
Abstract
The rapid advancement of Embodied Intelligence has opened transformative opportunities in healthcare, particularly in physical therapy and rehabilitation. However, critical challenges remain in developing robust embodied healthcare solutions, such as the lack of standardized evaluation benchmarks and the scarcity of open-source multimodal acupoint massage datasets. To address these gaps, we construct MedMassage-12K - a multimodal dataset containing 12,190 images with 174,177 QA pairs, covering diverse lighting conditions and backgrounds. Furthermore, we propose a hierarchical embodied massage framework, which includes a high-level acupoint grounding module and a low-level control module. The high-level acupoint grounding module uses multimodal large language models to understand human language and identify acupoint locations, while the low-level control module provides the planned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial Robot Interaction and HRI · Action Observation and Synchronization · Human Pose and Action Recognition
