TL;DR
This paper introduces an auto-tuning method for hyperparameters in self-supervised reinforcement learning using ELBO, improving performance and efficiency in robotic learning tasks.
Contribution
It proposes a novel auto-tuning technique based on ELBO for three key hyperparameters in self-supervised RL, enhancing efficiency and performance.
Findings
Auto-tuning improves policy performance.
Reduces time and computational resources needed.
Effective in both simulated and real-robot experiments.
Abstract
Policy optimization in reinforcement learning requires the selection of numerous hyperparameters across different environments. Fixing them incorrectly may negatively impact optimization performance leading notably to insufficient or redundant learning. Insufficient learning (due to convergence to local optima) results in under-performing policies whilst redundant learning wastes time and resources. The effects are further exacerbated when using single policies to solve multi-task learning problems. Observing that the Evidence Lower Bound (ELBO) used in Variational Auto-Encoders correlates with the diversity of image samples, we propose an auto-tuning technique based on the ELBO for self-supervised reinforcement learning. Our approach can auto-tune three hyperparameters: the replay buffer size, the number of policy gradient updates during each epoch, and the number of exploration steps…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
