Reinforcement Learning from Wild Animal Videos
Elliot Chane-Sane, Constant Roux, Olivier Stasse, Nicolas Mansard

TL;DR
This paper introduces RLWAV, a novel method that learns quadruped robot locomotion skills by leveraging wild animal videos, bridging the gap between natural animal movements and robotic control through reinforcement learning.
Contribution
The paper presents a new approach to robot learning that uses animal videos as a source of diverse motion data, enabling robots to acquire multiple skills without explicit trajectories or rewards.
Findings
Robots learned diverse skills like walking, jumping, and staying still.
The method successfully transfers policies from simulation to real robots.
Wild animal videos can effectively inform robot locomotion learning.
Abstract
We propose to learn legged robot locomotion skills by watching thousands of wild animal videos from the internet, such as those featured in nature documentaries. Indeed, such videos offer a rich and diverse collection of plausible motion examples, which could inform how robots should move. To achieve this, we introduce Reinforcement Learning from Wild Animal Videos (RLWAV), a method to ground these motions into physical robots. We first train a video classifier on a large-scale animal video dataset to recognize actions from RGB clips of animals in their natural habitats. We then train a multi-skill policy to control a robot in a physics simulator, using the classification score of a third-person camera capturing videos of the robot's movements as a reward for reinforcement learning. Finally, we directly transfer the learned policy to a real quadruped Solo. Remarkably, despite the…
Peer Reviews
Decision·Submitted to ICLR 2025
- The paper studies an interesting problem of learning reward models from videos - The proposed approach is interesting and in a good direction - The paper is well written and the presentation is clear
- Position of the paper (title, abstract, intro) is a bit misleading. It suggests that the reward function would come purely form videos. However, the approach uses a number of hand-designed reward terms such as air time, symmetry, and terminations. I think that this is ok but the positioning of the paper should be updated to reflect that. In the current version of the approach, the video model serves only as part of the overall reward function. - The results are promising but overall limited. L
1. Learning quadruped robot locomotion skills from existing wild animal locomotion is a good inspiration. 2. The task setup and experimental details are described clearly in the paper.
1. The current ablation study of the classifier training set is inadequate, making it hard to determine whether the method effectively utilizes cross-embodiment skills acquired from a diverse range of wild animal videos. The ablation should encompass factors such as the size of the training set and the number of different types of animals included in it. 2. While we anticipate gaining insights into four-legged movement skills from wild animal datasets, the only information we can provide the rob
1. The idea is novel. 2. The paper is well written. Easy to follow. 3. Experiments and ablation among its own algorithm shows effectiveness of the proposed method.
1. It seems the paper lacks comparison to some baseline or other works. For example, can we compare the results in sim w/ some hand crafted reward models? Then you can compare sample efficiency of the proposed method. 2. Would like to know how large the animal dataset needs to be to make it work. This work uses 8.7K videos. Do we need more or it can work w/ less? Can we add an ablation on it?
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Robotic Locomotion and Control
