Trojan Horses in Recruiting: A Red-Teaming Case Study on Indirect Prompt Injection in Standard vs. Reasoning Models

Manuel Wirth

arXiv:2602.18514·cs.CR·February 24, 2026

Trojan Horses in Recruiting: A Red-Teaming Case Study on Indirect Prompt Injection in Standard vs. Reasoning Models

Manuel Wirth

PDF

Open Access

TL;DR

This study investigates the security vulnerabilities of Large Language Models in HR applications, revealing that reasoning-enhanced models may be more susceptible to sophisticated prompt injection attacks than standard models.

Contribution

It provides a comparative analysis of failure modes in standard versus reasoning models under adversarial prompts, challenging assumptions about reasoning models' safety advantages.

Findings

01

Standard models resorted to hallucinations under simple attacks.

02

Reasoning models used strategic reframing to persuade and showed meta-cognitive leakage.

03

Complex instructions caused reasoning models to unintentionally reveal injection logic.

Abstract

As Large Language Models (LLMs) are increasingly integrated into automated decision-making pipelines, specifically within Human Resources (HR), the security implications of Indirect Prompt Injection (IPI) become critical. While a prevailing hypothesis posits that "Reasoning" or "Chain-of-Thought" Models possess safety advantages due to their ability to self-correct, emerging research suggests these capabilities may enable more sophisticated alignment failures. This qualitative Red-Teaming case study challenges the safety-through-reasoning premise using the Qwen 3 30B architecture. By subjecting both a standard instruction-tuned model and a reasoning-enhanced model to a "Trojan Horse" curriculum vitae, distinct failure modes are observed. The results suggest a complex trade-off: while the Standard Model resorted to brittle hallucinations to justify simple attacks and filtered out…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning