# A Two-Stage Reinforcement Learning Framework for Humanoid Robot Sitting and Standing-Up

**Authors:** Xisheng Jiang, Shihai Zhao, Yudi Zhu, Qingdu Li, Jianwei Zhang

PMC · DOI: 10.3390/biomimetics10110783 · 2025-11-17

## TL;DR

This paper introduces a two-stage reinforcement learning framework to teach humanoid robots to sit and stand up smoothly in real-world scenarios.

## Contribution

A novel two-stage RL framework that improves motion smoothness and stability for humanoid robot sitting and standing.

## Key findings

- The two-stage approach enables stable execution of sitting and standing in real-world scenarios.
- The bi-level optimization model dynamically adjusts tracking precision for smoother transitions.
- The method was successfully applied to a 1.7 m adult-scale humanoid robot.

## Abstract

In human daily-life scenarios, humanoid robots need not only to stand up smoothly but also to autonomously sit down for rest, energy management, and interaction. This capability is crucial for enhancing their autonomy and practicality. However, both sitting and standing involve complex dynamics constraints, diverse initial postures, and unstructured terrains, which make traditional hand-crafted controllers insufficient for multi-scenario demands. Reinforcement Learning (RL), with its generalization ability across high-dimensional state spaces and complex tasks, offers a promising solution for automatically generating motion control policies. Nevertheless, policies trained directly with RL often produce abrupt motions, making it difficult to balance smoothness and stability. To address these challenges, we propose a two-stage reinforcement learning framework: In the first stage, we focus on exploration and train initial policies for both sitting and standing, with relatively weak constraints on smoothness and joint safety, and without introducing noise. In the second stage, we refine the policies by tracking the motion trajectories obtained in the first stage, aiming for smoother transitions. We model the tracking problem as a bi-level optimization, where the tracking precision is dynamically adjusted based on the current tracking error, forming an adaptive curriculum mechanism. We apply this framework to a 1.7 m adult-scale humanoid robot, achieving stable execution in two representative real-world scenarios: sitting down onto a chair, stand up from a chair. Our approach provides a new perspective for the practical deployment of humanoid robots in real-world scenarios.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12650239/full.md

---
Source: https://tomesphere.com/paper/PMC12650239