q-exponential family for policy optimization

Lingwei Zhu; Haseeb Shah; Han Wang; Yukie Nagai; Martha White

arXiv:2408.07245·cs.LG·January 27, 2025

q-exponential family for policy optimization

Lingwei Zhu, Haseeb Shah, Han Wang, Yukie Nagai, Martha White

PDF

Open Access 1 Repo

TL;DR

This paper introduces the q-exponential family of policies for reinforcement learning, demonstrating that heavy-tailed policies like the Student's t-distribution outperform Gaussian policies in various actor-critic algorithms, especially in offline settings.

Contribution

It extends policy parametrization to the q-exponential family, enabling flexible tail behaviors and showing their effectiveness over traditional Gaussian policies.

Findings

01

Heavy-tailed policies outperform Gaussian in general.

02

Student's t-distribution shows increased stability.

03

Heavy-tailed q-Gaussian performs well in offline benchmarks.

Abstract

Policy optimization methods benefit from a simple and tractable policy parametrization, usually the Gaussian for continuous action spaces. In this paper, we consider a broader policy family that remains tractable: the $q$ -exponential family. This family of policies is flexible, allowing the specification of both heavy-tailed policies ( $q > 1$ ) and light-tailed policies ( $q < 1$ ). This paper examines the interplay between $q$ -exponential policies for several actor-critic algorithms conducted on both online and offline problems. We find that heavy-tailed policies are more effective in general and can consistently improve on Gaussian. In particular, we find the Student's t-distribution to be more stable than the Gaussian across settings and that a heavy-tailed $q$ -Gaussian for Tsallis Advantage Weighted Actor-Critic consistently performs well in offline benchmark problems. Our code is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lingweizhu/qexp
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Optimization Algorithms Research · Numerical Methods and Algorithms