SPAARS: Safer RL Policy Alignment through Abstract Exploration and Refined Exploitation of Action Space

Swaminathan S K; Aritra Hazra

arXiv:2603.09378·cs.LG·March 12, 2026

SPAARS: Safer RL Policy Alignment through Abstract Exploration and Refined Exploitation of Action Space

Swaminathan S K, Aritra Hazra

PDF

Open Access

TL;DR

SPAARS introduces a curriculum learning framework for safe offline-to-online reinforcement learning that initially explores within a latent space and then transitions to raw actions, improving sample efficiency and performance.

Contribution

It proposes a novel curriculum approach combining latent space exploration with raw action control, along with theoretical bounds and variance reduction proofs for safer RL policy refinement.

Findings

01

SPAARS-SUPE achieves 0.825 normalized return with 5x sample efficiency.

02

Standalone SPAARS surpasses IQL baselines on benchmark tasks.

03

Theoretical bounds on exploitation gap and variance reduction are established.

Abstract

Offline-to-online reinforcement learning (RL) offers a promising paradigm for robotics by pre-training policies on safe, offline demonstrations and fine-tuning them via online interaction. However, a fundamental challenge remains: how to safely explore online without deviating from the behavioral support of the offline data? While recent methods leverage conditional variational autoencoders (CVAEs) to bound exploration within a latent space, they inherently suffer from an exploitation gap -- a performance ceiling imposed by the decoder's reconstruction loss. We introduce SPAARS, a curriculum learning framework that initially constrains exploration to the low-dimensional latent manifold for sample-efficient, safe behavioral improvement, then seamlessly transfers control to the raw action space, bypassing the decoder bottleneck. SPAARS has two instantiations: the CVAE-based variant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Social Robot Interaction and HRI