Learning Causal States Under Partial Observability and Perturbation

Na Li; Hangguan Shan; Wei Ni; Wenjie Zhang; Xinyu Li; Yamin Wang

arXiv:2512.00357·cs.LG·December 2, 2025

Learning Causal States Under Partial Observability and Perturbation

Na Li, Hangguan Shan, Wei Ni, Wenjie Zhang, Xinyu Li, Yamin Wang

PDF

Open Access

TL;DR

This paper introduces CaDiff, a novel framework that improves reinforcement learning in partially observable and perturbed environments by uncovering causal states through an asynchronous diffusion model, backed by theoretical guarantees and empirical results.

Contribution

CaDiff is the first framework to use diffusion models for causal state approximation in P$^2$OMDPs, combining theoretical analysis with practical reinforcement learning improvements.

Findings

01

CaDiff improves RL returns by at least 14.18% in Roboschool tasks.

02

It provides a theoretical upper bound on value function approximation errors.

03

CaDiff effectively denoises observations to reveal underlying causal states.

Abstract

A critical challenge for reinforcement learning (RL) is making decisions based on incomplete and noisy observations, especially in perturbed and partially observable Markov decision processes (P $^{2}$ OMDPs). Existing methods fail to mitigate perturbations while addressing partial observability. We propose \textit{Causal State Representation under Asynchronous Diffusion Model (CaDiff)}, a framework that enhances any RL algorithm by uncovering the underlying causal structure of P $^{2}$ OMDPs. This is achieved by incorporating a novel asynchronous diffusion model (ADM) and a new bisimulation metric. ADM enables forward and reverse processes with different numbers of steps, thus interpreting the perturbation of P $^{2}$ OMDP as part of the noise suppressed through diffusion. The bisimulation metric quantifies the similarity between partially observable environments and their causal counterparts.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Explainable Artificial Intelligence (XAI)