Hierarchical Deep Deterministic Policy Gradient for Autonomous Maze Navigation of Mobile Robots

Wenjie Hu; Ye Zhou; Hann Woei Ho

arXiv:2508.04994·cs.RO·August 8, 2025

Hierarchical Deep Deterministic Policy Gradient for Autonomous Maze Navigation of Mobile Robots

Wenjie Hu, Ye Zhou, Hann Woei Ho

PDF

TL;DR

This paper introduces a Hierarchical DDPG algorithm for mobile robot maze navigation, significantly improving success rates and rewards by addressing exploration and planning challenges in complex environments.

Contribution

The paper proposes a novel hierarchical DDPG framework with high-level subgoal generation and low-level primitive actions, enhancing stability and exploration in maze navigation tasks.

Findings

01

Success rate increased by at least 56.59%.

02

Average reward improved by a minimum of 519.03.

03

Effective in complex maze environments with sparse rewards.

Abstract

Maze navigation is a fundamental challenge in robotics, requiring agents to traverse complex environments efficiently. While the Deep Deterministic Policy Gradient (DDPG) algorithm excels in control tasks, its performance in maze navigation suffers from sparse rewards, inefficient exploration, and long-horizon planning difficulties, often leading to low success rates and average rewards, sometimes even failing to achieve effective navigation. To address these limitations, this paper proposes an efficient Hierarchical DDPG (HDDPG) algorithm, which includes high-level and low-level policies. The high-level policy employs an advanced DDPG framework to generate intermediate subgoals from a long-term perspective and on a higher temporal scale. The low-level policy, also powered by the improved DDPG algorithm, generates primitive actions by observing current states and following the subgoal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.