When Sensors Fail: Temporal Sequence Models for Robust PPO under Sensor Drift

Kevin Vogt-Lowell; Theodoros Tsiligkaridis; Rodney Lafuente-Mercado; Surabhi Ghatti; Shanghua Gao; Marinka Zitnik; Daniela Rus

arXiv:2603.04648·cs.LG·March 25, 2026

When Sensors Fail: Temporal Sequence Models for Robust PPO under Sensor Drift

Kevin Vogt-Lowell, Theodoros Tsiligkaridis, Rodney Lafuente-Mercado, Surabhi Ghatti, Shanghua Gao, Marinka Zitnik, Daniela Rus

PDF

Open Access

TL;DR

This paper proposes augmenting PPO with temporal sequence models like Transformers and SSMs to improve robustness against sensor failures and observation drift in reinforcement learning, with theoretical bounds and empirical validation.

Contribution

It introduces a method combining PPO with sequence models to handle sensor failure-induced partial observability, providing theoretical bounds and demonstrating superior robustness in benchmarks.

Findings

01

Transformers outperform baselines under sensor dropout.

02

Theoretical bounds relate robustness to policy smoothness and failure persistence.

03

Sequence models maintain high performance despite severe sensor failures.

Abstract

Real-world reinforcement learning systems must operate under distributional drift in their observation streams, yet most policy architectures implicitly assume fully observed and noise-free states. We study robustness of Proximal Policy Optimization (PPO) under temporally persistent sensor failures that induce partial observability and representation shift. To respond to this drift, we augment PPO with temporal sequence models, including Transformers and State Space Models (SSMs), to enable policies to infer missing information from history and maintain performance. Under a stochastic sensor failure process, we prove a high-probability bound on infinite-horizon reward degradation that quantifies how robustness depends on policy smoothness and failure persistence. Empirically, on MuJoCo continuous-control benchmarks with severe sensor dropout, we show Transformer-based sequence policies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Advanced Bandit Algorithms Research