# Trustworthy navigation with variational policy in deep reinforcement learning

**Authors:** Karla Bockrath, Liam Ernst, Rohaan Nadeem, Bryan Pedraza, Dimah Dera

PMC · DOI: 10.3389/frobt.2025.1652050 · Frontiers in Robotics and AI · 2025-10-08

## TL;DR

This paper introduces Trust-Nav, a new framework for deep reinforcement learning that improves robot navigation by incorporating uncertainty estimation.

## Contribution

The novel integration of variational policy learning with uncertainty quantification in robot navigation and mapping.

## Key findings

- Trust-Nav outperforms deterministic DRL approaches in complex and noisy environments.
- The framework enables safer and more reliable autonomous navigation by embedding uncertainty in policy and mapping.
- Experiments in Gazebo show robust performance in unknown environments with adversarial attacks.

## Abstract

Developing a reliable and trustworthy navigation policy in deep reinforcement learning (DRL) for mobile robots is extremely challenging, particularly in real-world, highly dynamic environments. Particularly, exploring and navigating unknown environments without prior knowledge, while avoiding obstacles and collisions, is very cumbersome for mobile robots.

This study introduces a novel trustworthy navigation framework that utilizes variational policy learning to quantify uncertainty in the estimation of the robot’s action, localization, and map representation. Trust-Nav employs the Bayesian variational approximation of the posterior distribution over the policy-based neural network’s parameters. Policy-based and value-based learning are combined to guide the robot’s actions in unknown environments. We derive the propagation of variational moments through all layers of the policy network and employ a first-order approximation for the nonlinear activation functions. The uncertainty in robot action is measured by the propagated variational covariance in the DRL policy network. At the same time, the uncertainty in the robot’s localization and mapping is embedded in the reward function and stems from the traditional Theory of Optimal Experimental Design. The total loss function optimizes the parameters of the policy and value networks to maximize the robot’s cumulative reward in an unknown environment.

Experiments conducted using the Gazebo robotics simulator demonstrate the superior performance of the proposed Trust-Nav model in achieving robust autonomous navigation and mapping.

Trust-Nav consistently outperforms deterministic DRL approaches, particularly in complicated environments involving noisy conditions and adversarial attacks. This integration of uncertainty into the policy network promotes safer and more reliable navigation, especially in complex or unpredictable environments. Trust-Nav offers a step toward deployable, self-aware robotic systems capable of recognizing and responding to their own limitations.

## Full-text entities

- **Diseases:** DD (MESH:C536170)
- **Species:** Homo sapiens (human, species) [taxon 9606], Rahnella sp. N (species) [taxon 291580]
- **Mutations:** A2C

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12541417/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12541417/full.md

## References

48 references — full list in the complete paper: https://tomesphere.com/paper/PMC12541417/full.md

---
Source: https://tomesphere.com/paper/PMC12541417