Plan2Cleanse: Test-Time Backdoor Defense via Monte-Carlo Planning in Deep Reinforcement Learning

Sze-Ann Chen; Zhi-Yi Chin; Kui-Yuan Chen; Chi-Yu Li; Ping-Chun Hsieh

arXiv:2605.09638·cs.LG·May 12, 2026

Plan2Cleanse: Test-Time Backdoor Defense via Monte-Carlo Planning in Deep Reinforcement Learning

Sze-Ann Chen, Zhi-Yi Chin, Kui-Yuan Chen, Chi-Yu Li, Ping-Chun Hsieh

PDF

1 Repo

TL;DR

Plan2Cleanse introduces a test-time defense method for RL models using Monte Carlo planning to detect and neutralize backdoor attacks without retraining, demonstrating significant improvements in various environments.

Contribution

It recasts backdoor detection as a planning problem, enabling systematic exploration and mitigation of backdoors in RL models at test time.

Findings

01

Increased trigger detection success rates by over 61.4 percentage points in O-RAN scenarios.

02

Improved win rates from 35% to 53% in Humanoid environments.

03

Effective test-time defense demonstrated across MuJoCo, wireless networks, and Atari environments.

Abstract

Ensuring the security of reinforcement learning (RL) models is critical, particularly when they are trained by third parties and deployed in real-world systems. Attackers can implant backdoors into these models, causing them to behave normally under typical conditions, but execute malicious behaviors when specific triggers are activated. In this work, we propose Plan2Cleanse, a test-time detection and mitigation framework that adapts Monte Carlo Tree Search to efficiently identify and neutralize RL backdoor attacks without requiring model retraining. Our approach recasts backdoor detection as a planning problem, enabling systematic exploration of temporally extended trigger sequences while maintaining black-box access to the target policy. By leveraging the detection results, Plan2Cleanse can further achieve efficient mitigation through tree-search preventive replanning. We evaluated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rl-bandits-lab/RL-Backdoor
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.