Know Thy Enemy: Securing LLMs Against Prompt Injection via Diverse Data Synthesis and Instruction-Level Chain-of-Thought Learning

Zhiyuan Chang; Mingyang Li; Yuekai Huang; Ziyou Jiang; Xiaojun Jia; Qian Xiong; Junjie Wang; Zhaoyang Li; Qing Wang

arXiv:2601.04666·cs.AI·April 10, 2026

Know Thy Enemy: Securing LLMs Against Prompt Injection via Diverse Data Synthesis and Instruction-Level Chain-of-Thought Learning

Zhiyuan Chang, Mingyang Li, Yuekai Huang, Ziyou Jiang, Xiaojun Jia, Qian Xiong, Junjie Wang, Zhaoyang Li, Qing Wang

PDF

TL;DR

This paper introduces InstruCoT, a method that enhances LLMs to defend against prompt injection attacks by synthesizing diverse data and instruction-level reasoning, improving security without losing utility.

Contribution

The paper presents InstruCoT, a novel approach combining data synthesis and chain-of-thought fine-tuning to effectively detect and reject malicious prompts in LLMs.

Findings

01

InstruCoT significantly reduces vulnerability to prompt injection attacks.

02

It outperforms baseline methods across multiple security dimensions.

03

Utility performance remains unaffected by the proposed method.

Abstract

Large language model (LLM)-integrated applications have become increasingly prevalent, yet face critical security vulnerabilities from prompt injection (PI) attacks. Defending against PI attacks faces two major issues: malicious instructions can be injected through diverse vectors, and injected instructions often lack clear semantic boundaries from the surrounding context, making them difficult to identify. To address these issues, we propose InstruCoT, a model enhancement method for PI defense that synthesizes diverse training data and employs instruction-level chain-of-thought fine-tuning, enabling LLMs to effectively identify and reject malicious instructions regardless of their source or position in the context. We evaluate InstruCoT across three critical dimensions: Behavior Deviation, Privacy Leakage, and Harmful Output. Experimental results across four LLMs demonstrate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.