TL;DR
This paper introduces a prediction-based Markov Violation Score (MVS) to detect non-Markovian observations in reinforcement learning, helping distinguish sensor violations from other suboptimality sources.
Contribution
It proposes a novel, model-free score that quantifies non-Markovian structure in observations, validated across multiple environments and algorithms.
Findings
MVS correlates positively with noise intensity in high-dimensional tasks.
Under training noise, reward degradation aligns with increased MVS.
MVS effectively identifies partial observability and guides architecture choices.
Abstract
Reinforcement learning algorithms assume that observations satisfy the Markov property, yet real-world sensors frequently violate this assumption through correlated noise, latency, or partial observability. Standard performance metrics conflate Markov breakdowns with other sources of suboptimality, leaving practitioners without tools to detect such violations. This paper introduces a prediction-based Markov Violation Score (MVS) that quantifies non-Markovian structure in observation trajectories. A random forest first removes nonlinear Markov-compliant dynamics; ridge regression then tests whether historical observations reduce prediction error on the residuals beyond what the current observation provides. The resulting score is bounded in [0, 1] and requires no causal graph construction. Evaluation spans six environments (CartPole, Pendulum, Acrobot, HalfCheetah, Hopper, Walker2d),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
