Learning Causal States Under Partial Observability and Perturbation
Na Li, Hangguan Shan, Wei Ni, Wenjie Zhang, Xinyu Li, Yamin Wang

TL;DR
This paper introduces CaDiff, a novel framework that improves reinforcement learning in partially observable and perturbed environments by uncovering causal states through an asynchronous diffusion model, backed by theoretical guarantees and empirical results.
Contribution
CaDiff is the first framework to use diffusion models for causal state approximation in P$^2$OMDPs, combining theoretical analysis with practical reinforcement learning improvements.
Findings
CaDiff improves RL returns by at least 14.18% in Roboschool tasks.
It provides a theoretical upper bound on value function approximation errors.
CaDiff effectively denoises observations to reveal underlying causal states.
Abstract
A critical challenge for reinforcement learning (RL) is making decisions based on incomplete and noisy observations, especially in perturbed and partially observable Markov decision processes (POMDPs). Existing methods fail to mitigate perturbations while addressing partial observability. We propose \textit{Causal State Representation under Asynchronous Diffusion Model (CaDiff)}, a framework that enhances any RL algorithm by uncovering the underlying causal structure of POMDPs. This is achieved by incorporating a novel asynchronous diffusion model (ADM) and a new bisimulation metric. ADM enables forward and reverse processes with different numbers of steps, thus interpreting the perturbation of POMDP as part of the noise suppressed through diffusion. The bisimulation metric quantifies the similarity between partially observable environments and their causal counterparts.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Explainable Artificial Intelligence (XAI)
