SDS -- See it, Do it, Sorted: Quadruped Skill Synthesis from Single Video Demonstration
Maria Stamatopoulou, Jeffrey Li, and Dimitrios Kanoulas

TL;DR
SDS enables quadruped robots to learn multiple gaits from a single unstructured video demonstration using GPT-4o-based reward functions, achieving high fidelity and real-world stability efficiently.
Contribution
The paper introduces SDS, a novel pipeline that synthesizes quadruped locomotion skills from a single video without labels, leveraging GPT-4o for reward generation and self-supervised training.
Findings
Achieves 100% gait matching fidelity in simulation and real world
Generalizes to different quadruped morphologies like ANYmal
Outperforms prior methods in data efficiency and training speed
Abstract
Imagine a robot learning locomotion skills from any single video, without labels or reward engineering. We introduce SDS ("See it. Do it. Sorted."), an automated pipeline for skill acquisition from unstructured demonstrations. Using GPT-4o, SDS applies novel prompting techniques, in the form of spatio-temporal grid-based visual encoding () and structured input decomposition (SUS). These produce executable reward functions (RF) from the raw input videos. The RFs are used to train PPO policies and are optimized through closed-loop evolution, using training footage and performance metrics as self-supervised signals. SDS allows quadrupeds (e.g. Unitree Go1) to learn four gaits -- trot, bound, pace, and hop -- achieving 100% gait matching fidelity, Dynamic Time Warping (DTW) distance in the order of , and stable locomotion with zero failures, both in simulation and the real…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Educational Assessment and Pedagogy · Multimodal Machine Learning Applications
