Bingham Policy Parameterization for 3D Rotations in Reinforcement Learning
Stephen James, Pieter Abbeel

TL;DR
This paper introduces Bingham Policy Parameterization (BPP), a novel method for representing 3D rotations in reinforcement learning, which outperforms Gaussian policies in rotation-specific tasks.
Contribution
The paper proposes BPP, a new policy parameterization based on the Bingham distribution, tailored for better 3D rotation prediction in reinforcement learning environments.
Findings
BPP outperforms Gaussian policies in rotation tasks.
BPP improves performance on the Wahba problem and RLBench robot tasks.
Encourages development of environment-specific policy parameterizations.
Abstract
We propose a new policy parameterization for representing 3D rotations during reinforcement learning. Today in the continuous control reinforcement learning literature, many stochastic policy parameterizations are Gaussian. We argue that universally applying a Gaussian policy parameterization is not always desirable for all environments. One such case in particular where this is true are tasks that involve predicting a 3D rotation output, either in isolation, or coupled with translation as part of a full 6D pose output. Our proposed Bingham Policy Parameterization (BPP) models the Bingham distribution and allows for better rotation (quaternion) prediction over a Gaussian policy parameterization in a range of reinforcement learning tasks. We evaluate BPP on the rotation Wahba problem task, as well as a set of vision-based next-best pose robot manipulation tasks from RLBench. We hope that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Mechanisms and Dynamics · Reinforcement Learning in Robotics · Hereditary Neurological Disorders
