LocoVLM: Grounding Vision and Language for Adapting Versatile Legged Locomotion Policies

I Made Aswin Nahrendra; Seunghyun Lee; Dongkyu Lee; Hyun Myung

arXiv:2602.10399·cs.RO·February 12, 2026

LocoVLM: Grounding Vision and Language for Adapting Versatile Legged Locomotion Policies

I Made Aswin Nahrendra, Seunghyun Lee, Dongkyu Lee, Hyun Myung

PDF

Open Access

TL;DR

This paper introduces LocoVLM, a novel framework that integrates language and vision models to enable real-time, instruction-guided adaptation of legged robot locomotion, enhancing responsiveness to high-level semantic cues.

Contribution

It combines foundation models with locomotion policies to enable semantic understanding and real-time adaptation without cloud dependency.

Findings

01

Achieves up to 87% instruction-following accuracy

02

Enables real-time semantic-grounded locomotion adaptation

03

First to demonstrate high-level reasoning for legged robot control

Abstract

Recent advances in legged locomotion learning are still dominated by the utilization of geometric representations of the environment, limiting the robot's capability to respond to higher-level semantics such as human instructions. To address this limitation, we propose a novel approach that integrates high-level commonsense reasoning from foundation models into the process of legged locomotion adaptation. Specifically, our method utilizes a pre-trained large language model to synthesize an instruction-grounded skill database tailored for legged robots. A pre-trained vision-language model is employed to extract high-level environmental semantics and ground them within the skill database, enabling real-time skill advisories for the robot. To facilitate versatile skill control, we train a style-conditioned policy capable of generating diverse and robust locomotion skills with high fidelity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotic Locomotion and Control · Social Robot Interaction and HRI · Robot Manipulation and Learning