Offline Reinforcement Learning at Multiple Frequencies

Kaylee Burns; Tianhe Yu; Chelsea Finn; Karol Hausman

arXiv:2207.13082·cs.LG·July 27, 2022·1 cites

Offline Reinforcement Learning at Multiple Frequencies

Kaylee Burns, Tianhe Yu, Chelsea Finn, Karol Hausman

PDF

Open Access

TL;DR

This paper addresses the challenge of offline reinforcement learning with data collected at multiple control frequencies, proposing a method to stabilize learning by balancing Q-value updates across different discretizations, resulting in improved performance.

Contribution

The paper introduces a simple technique to balance Q-value propagation in offline RL with multi-frequency data, enhancing stability and performance.

Findings

01

Outperforms naive mixing by 50% on average in simulated robotic tasks.

02

Enforces consistency in Q-value updates across different discretizations.

03

Improves convergence stability in multi-frequency offline RL settings.

Abstract

Leveraging many sources of offline robot data requires grappling with the heterogeneity of such data. In this paper, we focus on one particular aspect of heterogeneity: learning from offline data collected at different control frequencies. Across labs, the discretization of controllers, sampling rates of sensors, and demands of a task of interest may differ, giving rise to a mixture of frequencies in an aggregated dataset. We study how well offline reinforcement learning (RL) algorithms can accommodate data with a mixture of frequencies during training. We observe that the $Q$ -value propagates at different rates for different discretizations, leading to a number of learning challenges for off-the-shelf offline RL. We present a simple yet effective solution that enforces consistency in the rate of $Q$ -value updates to stabilize learning. By scaling the value of $N$ in $N$ -step returns…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Data Stream Mining Techniques