AI Security Beyond Core Domains: Resume Screening as a Case Study of Adversarial Vulnerabilities in Specialized LLM Applications
Honglin Mu, Jinghao Liu, Kaiyang Wan, Rui Xing, Xiuying Chen, Timothy Baldwin, Wanxiang Che

TL;DR
This paper uncovers adversarial vulnerabilities in LLMs used for resume screening, introduces a benchmark to evaluate attacks, and compares defense strategies, highlighting training-time defenses' superior effectiveness.
Contribution
It presents a new benchmark for assessing adversarial attacks in resume screening and evaluates defense mechanisms, proposing FIDS as an effective training-time solution.
Findings
Attack success rates exceed 80% for certain adversarial methods.
Prompt-based defenses reduce attacks by 10.1% with 12.5% false rejections.
FIDS achieves 15.4% attack reduction with 10.4% false rejections.
Abstract
Large Language Models (LLMs) excel at text comprehension and generation, making them ideal for automated tasks like code review and content moderation. However, our research identifies a vulnerability: LLMs can be manipulated by "adversarial instructions" hidden in input data, such as resumes or code, causing them to deviate from their intended task. Notably, while defenses may exist for mature domains such as code review, they are often absent in other common applications such as resume screening and peer review. This paper introduces a benchmark to assess this vulnerability in resume screening, revealing attack success rates exceeding 80% for certain attack types. We evaluate two defense mechanisms: prompt-based defenses achieve 10.1% attack reduction with 12.5% false rejection increase, while our proposed FIDS (Foreign Instruction Detection through Separation) using LoRA adaptation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
