# Discovery of skill-switching criteria for learning agile quadruped locomotion

**Authors:** Wanming Yu, Fernando Acero, Vassil Atanassov, Chuanyu Yang, Ioannis Havoutis, Dimitrios Kanoulas, Zhibin Li

PMC · DOI: 10.3389/frobt.2026.1697159 · Frontiers in Robotics and AI · 2026-02-18

## TL;DR

This paper introduces a framework for quadruped robots to learn and switch between different movement skills automatically, enabling agile and adaptive locomotion.

## Contribution

A hierarchical learning framework that enables skill-switching in quadruped robots through deep reinforcement learning and optimization.

## Key findings

- The framework successfully learns and switches between gaits like trotting, bounding, and galloping in simulations and real-world tests.
- Skill transitions are smooth and occur based on the distance to the goal, improving overall locomotion performance.
- The method enables prompt recovery from unexpected failures during movement.

## Abstract

This study develops a hierarchical learning and optimization framework that can learn and achieve well-coordinated multi-skill locomotion. The learned multi-skill policy can switch between skills automatically and naturally while tracking arbitrarily positioned goals and can recover from failures promptly. The proposed framework is composed of a deep reinforcement learning process and an optimization process. First, the contact pattern is incorporated into the reward terms to learn different types of gaits as separate policies without the need for any other references. Then, a higher-level policy is learned to generate weights for individual policies to compose multi-skill locomotion in a goal-tracking task setting. Skills are automatically and naturally switched according to the distance to the goal. The appropriate distances for skill switching are incorporated into the reward calculation for learning the high-level policy and are updated by an outer optimization loop as learning progresses. We first demonstrate successful multi-skill locomotion in comprehensive tasks on a simulated Unitree A1 quadruped robot. We also deploy the learned policy in the real world, showcasing trotting, bounding, galloping, and their natural transitions as the goal position changes. Moreover, the learned policy can react to unexpected failures at any time, perform prompt recovery, and successfully resume locomotion. Compared to baselines, our proposed approach achieves all the learned agile skills with improved learning performance, enabling smoother and more continuous skill transitions.

## Full-text entities

- **Diseases:** FA (MESH:C565561)
- **Species:** Homo sapiens (human, species) [taxon 9606], Equus caballus (domestic horse, species) [taxon 9796]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12957656/full.md

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12957656/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/PMC12957656/full.md

---
Source: https://tomesphere.com/paper/PMC12957656