Direct Random Search for Fine Tuning of Deep Reinforcement Learning Policies
Sean Gillen, Asutay Ozmen, Katie Byl

TL;DR
This paper demonstrates that direct random search effectively fine-tunes deterministic policies in deep reinforcement learning, resulting in more consistent and higher-performing agents across various environments.
Contribution
It introduces a simple yet effective method for fine-tuning DRL policies through direct random search, improving performance and consistency.
Findings
More consistent agent performance across environments.
Higher average rewards compared to baseline policies.
Effective extension to state space reduction techniques.
Abstract
Researchers have demonstrated that Deep Reinforcement Learning (DRL) is a powerful tool for finding policies that perform well on complex robotic systems. However, these policies are often unpredictable and can induce highly variable behavior when evaluated with only slightly different initial conditions. Training considerations constrain DRL algorithm designs in that most algorithms must use stochastic policies during training. The resulting policy used during deployment, however, can and frequently is a deterministic one that uses the Maximum Likelihood Action (MLA) at each step. In this work, we show that a direct random search is very effective at fine-tuning DRL policies by directly optimizing them using deterministic rollouts. We illustrate this across a large collection of reinforcement learning environments, using a wide variety of policies obtained from different algorithms.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Adversarial Robustness in Machine Learning
MethodsRandom Search
