TrojanTO: Action-Level Backdoor Attacks against Trajectory Optimization Models

Yang Dai; Oubo Ma; Longfei Zhang; Xingxing Liang; Xiaochun Cao; Shouling Ji; Jiaheng Zhang; Jincai Huang; Li Shen

arXiv:2506.12815·cs.LG·June 17, 2025

TrojanTO: Action-Level Backdoor Attacks against Trajectory Optimization Models

Yang Dai, Oubo Ma, Longfei Zhang, Xingxing Liang, Xiaochun Cao, Shouling Ji, Jiaheng Zhang, Jincai Huang, Li Shen

PDF

Open Access 3 Reviews

TL;DR

TrojanTO introduces a novel action-level backdoor attack against trajectory optimization models, demonstrating effective, stealthy, and scalable attacks across various tasks and architectures with minimal data poisoning.

Contribution

It is the first to propose an action-level backdoor attack on TO models, employing alternating training and trajectory filtering for effectiveness and stealth.

Findings

01

Effective backdoor implantation with only 0.3% trajectory poisoning

02

Successful attacks across diverse TO tasks and architectures

03

High attack success rate with low detection risk

Abstract

Recent advances in Trajectory Optimization (TO) models have achieved remarkable success in offline reinforcement learning. However, their vulnerabilities against backdoor attacks are poorly understood. We find that existing backdoor attacks in reinforcement learning are based on reward manipulation, which are largely ineffective against the TO model due to its inherent sequence modeling nature. Moreover, the complexities introduced by high-dimensional action spaces further compound the challenge of action manipulation. To address these gaps, we propose TrojanTO, the first action-level backdoor attack against TO models. TrojanTO employs alternating training to enhance the connection between triggers and target actions for attack effectiveness. To improve attack stealth, it utilizes precise poisoning via trajectory filtering for normal performance and batch poisoning for trigger…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 5

Strengths

The TrojanTO design consists of three key components: Alternating Training - reinforces the association between the trigger pattern and target action; Trajectory Filtering - maintains benign performance by filtering non-critical trajectories; and Batch Poisoning - ensures trigger consistency and stealthiness across evaluation conditions. This novel combination of ideas makes TrojanTO effective and data-efficient for injecting post-training, action-level backdoors in offline trajectory optimizati

Weaknesses

The paper does not discuss possible defenses or mitigation strategies against TrojanTO. Even a brief evaluation of trigger detectability, model auditing, or defensive retraining would help position this research within the broader context of AI security and trustworthiness. The appendix B.1 has some discussion on defense but a more focuesed synopsis of experiments could be added to the main paper.

Reviewer 02Rating 4Confidence 5

Strengths

1. The paper is among the first few to study backdoor attacks in Trajectory Optimization settings to expose their vulnerability to backdoor attacks. 2. They propose an effective method called TrojanTO to successfully install a backdoor in pretrained policy through fine-tuning. 3. The authors have presented good experimental results to validate the effectiveness of their algorithm using dataset from different environments.

Weaknesses

1. TO is nothing but a supervised learning problem where the input space is a sequence and output space is an action. As such, attacking a TO algorithm is same as attacking a supervised sequential model that has been studied a lot in the past. So, the correct comparison should be to compare it to a backdoor attack method in a supervised learning setting. 2. It is not surprising that reward manipulation does not lead to successful backdoors in TO because TO never tries to optimize for rewards so

Reviewer 03Rating 6Confidence 3

Strengths

The main strength of this paper lies in its novel problem formulation and practical significance. By shifting the focus from training-time to post-training attacks, TrojanTO highlights an emerging vulnerability relevant to supply-chain security in large pretrained RL models. The analysis of key contributing factors—action, state, and reward—is methodical and provides new insights into the structural vulnerabilities of TO models. Methodologically, TrojanTO’s design elegantly combines trajectory f

Weaknesses

Despite its strong empirical performance, the work is somewhat limited in theoretical depth and scope of generalization. The core algorithmic components—particularly alternating optimization via MI-FGSM and trajectory filtering—are based on established techniques, and the paper lacks a unifying theoretical framework to quantify stealth–efficacy trade-offs or to analyze convergence guarantees. Additionally, the experimental evaluation, while extensive, remains confined to the D4RL suite, which pr

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Formal Methods in Verification