Collapsing Sequence-Level Data-Policy Coverage via Poisoning Attack in Offline Reinforcement Learning

Xue Zhou; Dapeng Man; Chen Xu; Fanyi Zeng; Tao Liu; Huan Wang; Shucheng He; Chaoyang Gao; Wu Yang

arXiv:2506.11172·cs.LG·June 16, 2025

Collapsing Sequence-Level Data-Policy Coverage via Poisoning Attack in Offline Reinforcement Learning

Xue Zhou, Dapeng Man, Chen Xu, Fanyi Zeng, Tao Liu, Huan Wang, Shucheng He, Chaoyang Gao, Wu Yang

PDF

Open Access

TL;DR

This paper introduces a sequence-level coverage measure in offline RL, reveals its exponential impact on errors, and demonstrates a poisoning attack that significantly degrades agent performance by targeting rare decision patterns.

Contribution

It proposes the sequence-level concentrability coefficient, analyzes its effect on errors, and develops a poisoning attack that exploits multi-step decision patterns to reduce data coverage.

Findings

01

Poisoning 1% of data can reduce performance by 90%.

02

Sequence-level coverage exponentially affects estimation errors.

03

The attack targets rare decision patterns to cause coverage collapse.

Abstract

Offline reinforcement learning (RL) heavily relies on the coverage of pre-collected data over the target policy's distribution. Existing studies aim to improve data-policy coverage to mitigate distributional shifts, but overlook security risks from insufficient coverage, and the single-step analysis is not consistent with the multi-step decision-making nature of offline RL. To address this, we introduce the sequence-level concentrability coefficient to quantify coverage, and reveal its exponential amplification on the upper bound of estimation errors through theoretical analysis. Building on this, we propose the Collapsing Sequence-Level Data-Policy Coverage (CSDPC) poisoning attack. Considering the continuous nature of offline RL data, we convert state-action pairs into decision units, and extract representative decision patterns that capture multi-step behavior. We identify rare…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics