Optimal sequential decision-making for error propagation mitigation in digital twins
Annice Najafi, Shokoufeh Mirzaei

TL;DR
This paper develops and compares optimal decision-making frameworks, including MDP and POMDP, for mitigating error propagation in digital twins, validated through simulations and reinforcement learning benchmarks.
Contribution
It introduces a novel MDP/POMDP-based approach for error mitigation in digital twins, extending prior HMM-based regime inference with decision process modeling.
Findings
MDP policy achieves highest cumulative reward and nominal operation time.
POMDP recovers about 95% of MDP performance under observation noise.
Reinforcement learning algorithms effectively learn policies without explicit models.
Abstract
Here, we explore the problem of error propagation mitigation in modular digital twins as a sequential decision process. Building on a companion study that used a Hidden Markov Model (HMM) to infer latent error regimes from surrogate-physics residuals, we develop a Markov Decision Process (MDP) in which the inferred regimes serve as states, corrective interventions serve as actions, and a scalar reward that takes into consideration the cost-benefit tradeoff between system fidelity and maintenance expense. The baseline transition matrix is extracted from the HMM-learned parameters. We then extend the formulation to a Partially Observable MDP (POMDP) that accounts for the imperfect nature of regime classification by maintaining a belief distribution updated via Bayesian filtering, with the HMM confusion matrix serving as the observation model. Both formulations are solved via dynamic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
