Hyperparameter Auto-tuning in Self-Supervised Robotic Learning

Jiancong Huang; Juan Rojas; Matthieu Zimmer; Hongmin Wu; Yisheng Guan,; and Paul Weng

arXiv:2010.08252·cs.RO·March 26, 2021

Hyperparameter Auto-tuning in Self-Supervised Robotic Learning

Jiancong Huang, Juan Rojas, Matthieu Zimmer, Hongmin Wu, Yisheng Guan,, and Paul Weng

PDF

2 Repos

TL;DR

This paper introduces an auto-tuning method for hyperparameters in self-supervised reinforcement learning using ELBO, improving performance and efficiency in robotic learning tasks.

Contribution

It proposes a novel auto-tuning technique based on ELBO for three key hyperparameters in self-supervised RL, enhancing efficiency and performance.

Findings

01

Auto-tuning improves policy performance.

02

Reduces time and computational resources needed.

03

Effective in both simulated and real-robot experiments.

Abstract

Policy optimization in reinforcement learning requires the selection of numerous hyperparameters across different environments. Fixing them incorrectly may negatively impact optimization performance leading notably to insufficient or redundant learning. Insufficient learning (due to convergence to local optima) results in under-performing policies whilst redundant learning wastes time and resources. The effects are further exacerbated when using single policies to solve multi-task learning problems. Observing that the Evidence Lower Bound (ELBO) used in Variational Auto-Encoders correlates with the diversity of image samples, we propose an auto-tuning technique based on the ELBO for self-supervised reinforcement learning. Our approach can auto-tune three hyperparameters: the replay buffer size, the number of policy gradient updates during each epoch, and the number of exploration steps…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.