TL;DR
This paper introduces SteganoPrompt, a web tool embedding invisible instructions into prompts to reliably detect if students copy-paste responses from LLMs, addressing limitations of existing detection methods.
Contribution
It proposes an input-side watermarking method using Unicode tags to identify LLM-generated content without requiring model cooperation.
Findings
The watermark survives common copy-paste channels across multiple platforms.
It is reliably detected by various LLM families.
The tool is publicly available under MIT license.
Abstract
Large language models (LLMs) have made fluent essay writing, code drafting, and quiz answering instantly available to students at every level, from secondary school through graduate study. Many educators do not object to LLM use \emph{per~se}; what they need to detect is the case in which a student pastes the assignment prompt into a chatbot and submits the model's reply verbatim, without engaging with the work. Existing post-hoc AI-text detectors remain unreliable and have been shown to penalise non-native English writers, while output-side watermarks require cooperation from the model provider. We propose an alternative that the educator controls directly: an input-side watermark in which an invisible instruction is embedded inside the visible assignment prompt itself. An LLM that ingests the prompt verbatim quietly reads the hidden instruction and writes a tell-tale signature into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
