Fat-to-Thin Policy Optimization: Offline RL with Sparse Policies

Lingwei Zhu; Han Wang; Yukie Nagai

arXiv:2501.14373·cs.LG·January 27, 2025

Fat-to-Thin Policy Optimization: Offline RL with Sparse Policies

Lingwei Zhu, Han Wang, Yukie Nagai

PDF

Open Access 1 Repo

TL;DR

This paper introduces FtTPO, a novel offline reinforcement learning algorithm that effectively trains sparse policies by leveraging a heavy-tailed proposal policy, improving safety-critical applications and standard benchmarks.

Contribution

It proposes the first offline policy optimization method specifically designed for sparse policies, using a fat-to-thin policy transfer approach with the $q$-Gaussian family.

Findings

01

Performs well in safety-critical treatment simulations.

02

Achieves favorable results on MuJoCo benchmarks.

03

Demonstrates effective learning from logged datasets for sparse policies.

Abstract

Sparse continuous policies are distributions that can choose some actions at random yet keep strictly zero probability for the other actions, which are radically different from the Gaussian. They have important real-world implications, e.g. in modeling safety-critical tasks like medicine. The combination of offline reinforcement learning and sparse policies provides a novel paradigm that enables learning completely from logged datasets a safety-aware sparse policy. However, sparse policies can cause difficulty with the existing offline algorithms which require evaluating actions that fall outside of the current support. In this paper, we propose the first offline policy optimization algorithm that tackles this challenge: Fat-to-Thin Policy Optimization (FtTPO). Specifically, we maintain a fat (heavy-tailed) proposal policy that effectively learns from the dataset and injects knowledge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lingweizhu/fat2thin
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEnergy, Environment, and Transportation Policies