Learning Vision-Guided Quadrupedal Locomotion End-to-End with   Cross-Modal Transformers

Ruihan Yang; Minghao Zhang; Nicklas Hansen; Huazhe Xu; Xiaolong Wang

arXiv:2107.03996·cs.LG·May 27, 2022·27 cites

Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers

Ruihan Yang, Minghao Zhang, Nicklas Hansen, Huazhe Xu, Xiaolong Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces LocoTransformer, an end-to-end reinforcement learning approach using cross-modal transformers that combines proprioceptive and visual data to enhance quadrupedal robot locomotion and generalization in complex terrains.

Contribution

The paper presents a novel RL method that integrates visual and proprioceptive inputs with transformers for improved quadrupedal locomotion and transferability from simulation to real-world environments.

Findings

01

Significant performance improvement over baseline methods.

02

Enhanced generalization to unseen terrains and obstacles.

03

Successful transfer of policies from simulation to real robots.

Abstract

We propose to address quadrupedal locomotion tasks using Reinforcement Learning (RL) with a Transformer-based model that learns to combine proprioceptive information and high-dimensional depth sensor inputs. While learning-based locomotion has made great advances using RL, most methods still rely on domain randomization for training blind agents that generalize to challenging terrains. Our key insight is that proprioceptive states only offer contact measurements for immediate reaction, whereas an agent equipped with visual sensory observations can learn to proactively maneuver environments with obstacles and uneven terrain by anticipating changes in the environment many steps ahead. In this paper, we introduce LocoTransformer, an end-to-end RL method that leverages both proprioceptive states and visual observations for locomotion control. We evaluate our method in challenging simulated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mehooz/vision4leg
pytorch

Videos

Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers· slideslive

Taxonomy

TopicsHuman Pose and Action Recognition · Robotic Locomotion and Control · Multimodal Machine Learning Applications