TrojanTO: Action-Level Backdoor Attacks against Trajectory Optimization Models
Yang Dai, Oubo Ma, Longfei Zhang, Xingxing Liang, Xiaochun Cao, Shouling Ji, Jiaheng Zhang, Jincai Huang, Li Shen

TL;DR
TrojanTO introduces a novel action-level backdoor attack against trajectory optimization models, demonstrating effective, stealthy, and scalable attacks across various tasks and architectures with minimal data poisoning.
Contribution
It is the first to propose an action-level backdoor attack on TO models, employing alternating training and trajectory filtering for effectiveness and stealth.
Findings
Effective backdoor implantation with only 0.3% trajectory poisoning
Successful attacks across diverse TO tasks and architectures
High attack success rate with low detection risk
Abstract
Recent advances in Trajectory Optimization (TO) models have achieved remarkable success in offline reinforcement learning. However, their vulnerabilities against backdoor attacks are poorly understood. We find that existing backdoor attacks in reinforcement learning are based on reward manipulation, which are largely ineffective against the TO model due to its inherent sequence modeling nature. Moreover, the complexities introduced by high-dimensional action spaces further compound the challenge of action manipulation. To address these gaps, we propose TrojanTO, the first action-level backdoor attack against TO models. TrojanTO employs alternating training to enhance the connection between triggers and target actions for attack effectiveness. To improve attack stealth, it utilizes precise poisoning via trajectory filtering for normal performance and batch poisoning for trigger…
Peer Reviews
Decision·ICLR 2026 Poster
The TrojanTO design consists of three key components: Alternating Training - reinforces the association between the trigger pattern and target action; Trajectory Filtering - maintains benign performance by filtering non-critical trajectories; and Batch Poisoning - ensures trigger consistency and stealthiness across evaluation conditions. This novel combination of ideas makes TrojanTO effective and data-efficient for injecting post-training, action-level backdoors in offline trajectory optimizati
The paper does not discuss possible defenses or mitigation strategies against TrojanTO. Even a brief evaluation of trigger detectability, model auditing, or defensive retraining would help position this research within the broader context of AI security and trustworthiness. The appendix B.1 has some discussion on defense but a more focuesed synopsis of experiments could be added to the main paper.
1. The paper is among the first few to study backdoor attacks in Trajectory Optimization settings to expose their vulnerability to backdoor attacks. 2. They propose an effective method called TrojanTO to successfully install a backdoor in pretrained policy through fine-tuning. 3. The authors have presented good experimental results to validate the effectiveness of their algorithm using dataset from different environments.
1. TO is nothing but a supervised learning problem where the input space is a sequence and output space is an action. As such, attacking a TO algorithm is same as attacking a supervised sequential model that has been studied a lot in the past. So, the correct comparison should be to compare it to a backdoor attack method in a supervised learning setting. 2. It is not surprising that reward manipulation does not lead to successful backdoors in TO because TO never tries to optimize for rewards so
The main strength of this paper lies in its novel problem formulation and practical significance. By shifting the focus from training-time to post-training attacks, TrojanTO highlights an emerging vulnerability relevant to supply-chain security in large pretrained RL models. The analysis of key contributing factors—action, state, and reward—is methodical and provides new insights into the structural vulnerabilities of TO models. Methodologically, TrojanTO’s design elegantly combines trajectory f
Despite its strong empirical performance, the work is somewhat limited in theoretical depth and scope of generalization. The core algorithmic components—particularly alternating optimization via MI-FGSM and trajectory filtering—are based on established techniques, and the paper lacks a unifying theoretical framework to quantify stealth–efficacy trade-offs or to analyze convergence guarantees. Additionally, the experimental evaluation, while extensive, remains confined to the D4RL suite, which pr
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Formal Methods in Verification
