# A substation robot path planning algorithm based on deep reinforcement learning enhanced by ant colony optimization

**Authors:** Hongwei Zhang, Lijun Sun, Weihong Tan, Siyu Bao, Xing He, Jinguo Chen

PMC · DOI: 10.3389/frobt.2025.1759501 · Frontiers in Robotics and AI · 2026-02-04

## TL;DR

This paper introduces a new robot path planning algorithm for substations that combines deep reinforcement learning with ant colony optimization to improve efficiency and performance.

## Contribution

The novel contribution is a synergistic framework combining bio-inspired ant colony optimization with deep reinforcement learning for substation robot path planning.

## Key findings

- The proposed algorithm achieves 24% higher sample efficiency compared to state-of-the-art methods.
- It reduces average path length by 18% and improves dynamic obstacle avoidance performance.
- Field tests show a 14.8% improvement in task completion rate in real substations.

## Abstract

Substation robots face significant challenges in path planning due to the complex electromagnetic environment, dense equipment layout, and safety-critical operational requirements. This paper proposes a path planning algorithm based on deep reinforcement learning enhanced by ant colony optimization, establishing a synergistic optimization framework that combines bio-inspired algorithms with deep learning. The proposed method addresses critical path planning issues in substation inspection and maintenance operations. The approach includes: 1) designing a pheromone-guided exploration strategy that transforms environmental prior knowledge into spatial bias to reduce ineffective exploration; 2) establishing a high-quality sample screening mechanism that enhances Q-network training through ant colony path experience to improve sample efficiency; 3) implementing dynamic decision weight adjustment that enables gradual transition from heuristic guidance to autonomous learning decisions. Experimental results in complex environments demonstrate the method’s superiority. Compared to state-of-the-art baselines including PPO, DDQN, and A*, the proposed method achieves 24% higher sample efficiency, 18% reduction in average path length, and superior dynamic obstacle avoidance. Field validation in a 2,500-square-meter substation confirms a 14.8% improvement in task completion rate compared to standard DRL approaches.

## Full-text entities

- **Chemicals:** DQN (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12914723/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12914723/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/PMC12914723/full.md

---
Source: https://tomesphere.com/paper/PMC12914723