Equivariant Offline Reinforcement Learning

Arsh Tangri; Ondrej Biza; Dian Wang; David Klee; Owen Howell; Robert; Platt

arXiv:2406.13961·cs.LG·June 21, 2024

Equivariant Offline Reinforcement Learning

Arsh Tangri, Ondrej Biza, Dian Wang, David Klee, Owen Howell, Robert, Platt

PDF

Open Access

TL;DR

This paper explores the use of $SO(2)$-equivariant neural networks in offline reinforcement learning for robotic manipulation, demonstrating improved performance in low-data scenarios by leveraging symmetry properties.

Contribution

It introduces the integration of $SO(2)$-equivariant neural networks into offline RL algorithms, showing their advantage over non-equivariant methods in data-limited robotic tasks.

Findings

01

Equivariant CQL and IQL outperform non-equivariant versions.

02

Equivariance enhances offline RL in low-data regimes.

03

Empirical results validate the benefit of symmetry-aware networks.

Abstract

Sample efficiency is critical when applying learning-based methods to robotic manipulation due to the high cost of collecting expert demonstrations and the challenges of on-robot policy learning through online Reinforcement Learning (RL). Offline RL addresses this issue by enabling policy learning from an offline dataset collected using any behavioral policy, regardless of its quality. However, recent advancements in offline RL have predominantly focused on learning from large datasets. Given that many robotic manipulation tasks can be formulated as rotation-symmetric problems, we investigate the use of $S O (2)$ -equivariant neural networks for offline RL with a limited number of demonstrations. Our experimental results show that equivariant versions of Conservative Q-Learning (CQL) and Implicit Q-Learning (IQL) outperform their non-equivariant counterparts. We provide empirical evidence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsElevator Systems and Control · Adaptive Dynamic Programming Control

MethodsQ-Learning