Uncovering Hidden Intentions: Exploring Prompt Recovery for Deeper Insights into Generated Texts
Louis Give, Timo Zaoral, Maria Antonietta Bruno

TL;DR
This paper explores prompt recovery from AI-generated texts, investigating methods like zero-shot, few-shot, and fine-tuning to determine if the original prompts can be accurately reconstructed, offering insights beyond simple detection.
Contribution
It is the first study to systematically investigate prompt recovery for generated texts across various learning approaches without task restrictions.
Findings
Prompt recovery is feasible with reasonable accuracy.
Fine-tuning and in-context learning improve recovery performance.
Using semi-synthetic datasets enhances evaluation robustness.
Abstract
Today, the detection of AI-generated content is receiving more and more attention. Our idea is to go beyond detection and try to recover the prompt used to generate a text. This paper, to the best of our knowledge, introduces the first investigation in this particular domain without a closed set of tasks. Our goal is to study if this approach is promising. We experiment with zero-shot and few-shot in-context learning but also with LoRA fine-tuning. After that, we evaluate the benefits of using a semi-synthetic dataset. For this first study, we limit ourselves to text generated by a single model. The results show that it is possible to recover the original prompt with a reasonable degree of accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsSparse Evolutionary Training
