Collapsing Sequence-Level Data-Policy Coverage via Poisoning Attack in Offline Reinforcement Learning
Xue Zhou, Dapeng Man, Chen Xu, Fanyi Zeng, Tao Liu, Huan Wang, Shucheng He, Chaoyang Gao, Wu Yang

TL;DR
This paper introduces a sequence-level coverage measure in offline RL, reveals its exponential impact on errors, and demonstrates a poisoning attack that significantly degrades agent performance by targeting rare decision patterns.
Contribution
It proposes the sequence-level concentrability coefficient, analyzes its effect on errors, and develops a poisoning attack that exploits multi-step decision patterns to reduce data coverage.
Findings
Poisoning 1% of data can reduce performance by 90%.
Sequence-level coverage exponentially affects estimation errors.
The attack targets rare decision patterns to cause coverage collapse.
Abstract
Offline reinforcement learning (RL) heavily relies on the coverage of pre-collected data over the target policy's distribution. Existing studies aim to improve data-policy coverage to mitigate distributional shifts, but overlook security risks from insufficient coverage, and the single-step analysis is not consistent with the multi-step decision-making nature of offline RL. To address this, we introduce the sequence-level concentrability coefficient to quantify coverage, and reveal its exponential amplification on the upper bound of estimation errors through theoretical analysis. Building on this, we propose the Collapsing Sequence-Level Data-Policy Coverage (CSDPC) poisoning attack. Considering the continuous nature of offline RL data, we convert state-action pairs into decision units, and extract representative decision patterns that capture multi-step behavior. We identify rare…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics
