# LG-H-PPO: offline hierarchical PPO for robot path planning on a latent graph

**Authors:** Xiang Han

PMC · DOI: 10.3389/frobt.2025.1737238 · Frontiers in Robotics and AI · 2026-01-07

## TL;DR

This paper introduces LG-H-PPO, a new method for robot path planning that uses a structured latent graph to improve training efficiency and stability in complex environments.

## Contribution

The main contribution is introducing graph structures into latent variable HRL planning to simplify high-level policy learning.

## Key findings

- LG-H-PPO outperforms advanced baselines like Guider and HIQL in convergence speed and task success rates.
- Using a latent graph reduces the complexity of high-level planning, leading to more stable training.
- The method shows promise for future research combining latent representations with explicit graph planning.

## Abstract

The path planning capability of autonomous robots in complex environments is crucial for their widespread application in the real world. However, long-term decision-making and sparse reward signals pose significant challenges to traditional reinforcement learning (RL) algorithms. Offline hierarchical reinforcement learning offers an effective approach by decomposing tasks into two stages: high-level subgoal generation and low-level subgoal attainment. Advanced Offline HRL methods, such as Guider and HIQL, typically introduce latent spaces in high-level policies to represent subgoals, thereby handling high-dimensional states and enhancing generalization. However, these approaches require the high-level policy to search and generate sub-objectives within a continuous latent space. This remains a complex and sample-inefficient challenge for policy optimization algorithms—particularly policy gradient-based PPO—often leading to unstable training and slow convergence. To address this core limitation, this paper proposes a novel offline hierarchical PPO framework—LG-H-PPO (Latent Graph-based Hierarchical PPO). The core innovation of LG-H-PPO lies in discretizing the continuous latent space into a structured “latent graph.” By transforming high-level planning from challenging “continuous creation” to simple “discrete selection,” LG-H-PPO substantially reduces the learning difficulty for the high-level policy. Preliminary experiments on standard D4RL offline navigation benchmarks demonstrate that LG-H-PPO achieves significant advantages over advanced baselines like Guider and HIQL in both convergence speed and final task success rates. The main contribution of this paper is introducing graph structures into latent variable HRL planning. This effectively simplifies the action space for high-level policies, enhancing the training efficiency and stability of offline HRL algorithms for long-sequence navigation tasks. It lays the foundation for future offline HRL research combining latent variable representations with explicit graph planning.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12819167/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12819167/full.md

## References

16 references — full list in the complete paper: https://tomesphere.com/paper/PMC12819167/full.md

---
Source: https://tomesphere.com/paper/PMC12819167