Dynamic Entropy Tuning in Reinforcement Learning Low-Level Quadcopter Control: Stochasticity vs Determinism

Youssef Mahran; Zeyad Gamal; Ayman El-Badawy

arXiv:2512.18336·cs.RO·December 23, 2025

Dynamic Entropy Tuning in Reinforcement Learning Low-Level Quadcopter Control: Stochasticity vs Determinism

Youssef Mahran, Zeyad Gamal, Ayman El-Badawy

PDF

Open Access

TL;DR

This paper investigates how dynamic entropy tuning in reinforcement learning affects low-level quadcopter control, comparing stochastic and deterministic policies, and demonstrates that dynamic entropy improves exploration and prevents catastrophic forgetting.

Contribution

It introduces the application of dynamic entropy tuning in RL for quadcopter control, showing its benefits over static entropy and deterministic approaches.

Findings

01

Dynamic entropy tuning enhances exploration efficiency.

02

Dynamic entropy prevents catastrophic forgetting.

03

Stochastic policies with dynamic entropy outperform deterministic policies in control tasks.

Abstract

This paper explores the impact of dynamic entropy tuning in Reinforcement Learning (RL) algorithms that train a stochastic policy. Its performance is compared against algorithms that train a deterministic one. Stochastic policies optimize a probability distribution over actions to maximize rewards, while deterministic policies select a single deterministic action per state. The effect of training a stochastic policy with both static entropy and dynamic entropy and then executing deterministic actions to control the quadcopter is explored. It is then compared against training a deterministic policy and executing deterministic actions. For the purpose of this research, the Soft Actor-Critic (SAC) algorithm was chosen for the stochastic algorithm while the Twin Delayed Deep Deterministic Policy Gradient (TD3) was chosen for the deterministic algorithm. The training and simulation results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · UAV Applications and Optimization