# DSAC-ICM: A Distributional Reinforcement Learning Framework for Path Planning in 3D Uneven Terrains

**Authors:** Yixin Zhou, Fan Liu, Zhixiao Liu, Xianghan Ji, Guangqiang Yin

PMC · DOI: 10.3390/s26030853 · Sensors (Basel, Switzerland) · 2026-01-28

## TL;DR

This paper introduces DSAC-ICM, a new reinforcement learning method that improves robot navigation in 3D uneven terrains by addressing value overestimation and exploration inefficiency.

## Contribution

DSAC-ICM integrates distributional reinforcement learning with an intrinsic curiosity module to enhance exploration and stability in 3D terrain path planning.

## Key findings

- DSAC-ICM achieves better path quality and computational efficiency compared to traditional path planning algorithms.
- The method outperforms other RL baselines in convergence speed and return in 3D terrain environments.
- The integration of ICM generates dense intrinsic rewards, improving exploration in sparse reward settings.

## Abstract

Ground autonomous mobile robots are increasingly critical for reconnaissance, patrol, and resupply tasks in public safety and national defense scenarios, where global path planning in 3D uneven terrains remains a major challenge. Traditional planners struggle with high dimensionality, while Deep Reinforcement Learning (DRL) is hindered by two key issues: (1) systematic overestimation of action values (Q-values) due to function approximation error, which leads to suboptimal policies and training instability; and (2) inefficient exploration under sparse reward signals. To address these limitations, we propose DSAC-ICM: a Distributional Soft Actor–Critic framework integrated with an Intrinsic Curiosity Module (ICM). Our method fundamentally shifts the learning paradigm from estimating scalar Q-values to learning the full probability distribution of state-action returns, which inherently mitigates value overestimation. We further integrate the ICM to generate dense intrinsic rewards, guiding the agent toward novel and unvisited states to tackle the exploration challenge. Comprehensive experiments conducted in a suite of realistic 3D uneven-terrain environments demonstrate that DSAC-ICM successfully enables the agent to learn effective navigation capabilities. Crucially, it achieves a superior trade-off between path quality and computational cost when compared to traditional path planning algorithms. Furthermore, DSAC-ICM significantly outperforms other RL baselines in terms of convergence speed and return.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12899962/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12899962/full.md

## References

35 references — full list in the complete paper: https://tomesphere.com/paper/PMC12899962/full.md

---
Source: https://tomesphere.com/paper/PMC12899962