Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers
Ruihan Yang, Minghao Zhang, Nicklas Hansen, Huazhe Xu, Xiaolong Wang

TL;DR
This paper introduces LocoTransformer, an end-to-end reinforcement learning approach using cross-modal transformers that combines proprioceptive and visual data to enhance quadrupedal robot locomotion and generalization in complex terrains.
Contribution
The paper presents a novel RL method that integrates visual and proprioceptive inputs with transformers for improved quadrupedal locomotion and transferability from simulation to real-world environments.
Findings
Significant performance improvement over baseline methods.
Enhanced generalization to unseen terrains and obstacles.
Successful transfer of policies from simulation to real robots.
Abstract
We propose to address quadrupedal locomotion tasks using Reinforcement Learning (RL) with a Transformer-based model that learns to combine proprioceptive information and high-dimensional depth sensor inputs. While learning-based locomotion has made great advances using RL, most methods still rely on domain randomization for training blind agents that generalize to challenging terrains. Our key insight is that proprioceptive states only offer contact measurements for immediate reaction, whereas an agent equipped with visual sensory observations can learn to proactively maneuver environments with obstacles and uneven terrain by anticipating changes in the environment many steps ahead. In this paper, we introduce LocoTransformer, an end-to-end RL method that leverages both proprioceptive states and visual observations for locomotion control. We evaluate our method in challenging simulated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Robotic Locomotion and Control · Multimodal Machine Learning Applications
