Finite Memory Belief Approximation for Optimal Control in Partially Observable Markov Decision Processes
Mintae Kim

TL;DR
This paper develops a metric-based theory for finite memory belief approximation in partially observable stochastic control, quantifying how information loss affects control performance and validating the theory with LQG systems.
Contribution
It introduces a Wasserstein metric-based framework to analyze the impact of finite memory belief approximations on control performance in POMDPs, with explicit bounds and empirical validation.
Findings
Belief mismatch decays exponentially with memory length.
Performance degradation scales with belief mismatch.
The framework provides a metric-aware characterization of finite memory effects.
Abstract
We study finite memory belief approximation for partially observable (PO) stochastic optimal control (SOC) problems. While belief states are sufficient for SOC in partially observable Markov decision processes (POMDPs), they are generally infinite-dimensional and impractical. We interpret truncated input-output (IO) histories as inducing a belief approximation and develop a metric-based theory that directly relates information loss to control performance. Using the Wasserstein metric, we derive policy-conditional performance bounds that quantify value degradation induced by finite memory along typical closed-loop trajectories. Our analysis proceeds via a fixed-policy comparison: we evaluate two cost functionals under the same closed-loop execution and isolate the effect of replacing the true belief by its finite memory approximation inside the belief-level cost. For linear quadratic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Control Systems Optimization · Advanced Bandit Algorithms Research
