TRAPDOC: Deceiving LLM Users by Injecting Imperceptible Phantom Tokens into Documents
Hyundong Jin, Sicheol Sung, Shinwoo Park, SeungYeop Baik, Yo-Sub Han

TL;DR
TRAPDOC is a framework that subtly injects phantom tokens into documents to deceive over-reliant LLM users by causing them to generate plausible but incorrect outputs, highlighting societal risks of misuse.
Contribution
We introduce a novel method for injecting imperceptible phantom tokens into documents to manipulate LLM outputs, and develop TRAPDOC to demonstrate this deception in practical scenarios.
Findings
TRAPDOC effectively deceives proprietary LLMs in experiments.
The framework produces plausible yet incorrect outputs.
Our method outperforms several baseline approaches.
Abstract
The reasoning, writing, text-editing, and retrieval capabilities of proprietary large language models (LLMs) have advanced rapidly, providing users with an ever-expanding set of functionalities. However, this growing utility has also led to a serious societal concern: the over-reliance on LLMs. In particular, users increasingly delegate tasks such as homework, assignments, or the processing of sensitive documents to LLMs without meaningful engagement. This form of over-reliance and misuse is emerging as a significant social issue. In order to mitigate these issues, we propose a method injecting imperceptible phantom tokens into documents, which causes LLMs to generate outputs that appear plausible to users but are in fact incorrect. Based on this technique, we introduce TRAPDOC, a framework designed to deceive over-reliant LLM users. Through empirical evaluation, we demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDigital and Cyber Forensics · Digital Media Forensic Detection
MethodsSparse Evolutionary Training
