# Solving robotics tasks with prior demonstration via exploration-efficient deep reinforcement learning

**Authors:** Chengyandan Shen, Christoffer Sloth

PMC · DOI: 10.3389/frobt.2025.1682200 · Frontiers in Robotics and AI · 2026-01-12

## TL;DR

This paper introduces a new deep reinforcement learning framework that efficiently learns robotics tasks using demonstrations and reduces inefficient exploration.

## Contribution

The novel action selection module in DRLR reduces bootstrapping errors and prevents sub-optimal policy convergence using SAC instead of TD3.

## Key findings

- The DRLR framework successfully learns complex robotics tasks like bucket loading and open drawer.
- The framework demonstrates robustness across tasks with varying state-action dimensions and demonstration qualities.
- The DRLR framework was successfully deployed on a real-world wheel loader for bucket loading.

## Abstract

This paper proposes an exploration-efficient deep reinforcement learning with reference (DRLR) policy framework for learning robotics tasks incorporating demonstrations. The DRLR framework is developed based on an imitation bootstrapped reinforcement learning (IBRL) algorithm. Here, we propose to improve IBRL by modifying the action selection module. The proposed action selection module provides a calibrated Q-value, which mitigates the bootstrapping error that otherwise leads to inefficient exploration. Furthermore, to prevent the reinforcement learning (RL) policy from converging to a sub-optimal policy, soft actor–critic (SAC) is used as the RL policy instead of twin delayed DDPG (TD3). The effectiveness of our method in mitigating the bootstrapping error and preventing overfitting is empirically validated by learning two robotics tasks: bucket loading and open drawer, which require extensive interactions with the environment. Simulation results also demonstrate the robustness of the DRLR framework across tasks with both low and high state–action dimensions and varying demonstration qualities. To evaluate the developed framework on a real-world industrial robotics task, the bucket loading task is deployed on a real wheel loader. The sim-to-real results validate the successful deployment of the DRLR framework.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12832430/full.md

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12832430/full.md

## References

37 references — full list in the complete paper: https://tomesphere.com/paper/PMC12832430/full.md

---
Source: https://tomesphere.com/paper/PMC12832430