Tricking LLM-Based NPCs into Spilling Secrets
Kyohei Shiomi, Zhuotao Lian, Toru Nakanishi, Teruaki Kitasuka

TL;DR
This paper investigates the security risks of using LLMs for game NPC dialogue by demonstrating how adversarial prompts can trick NPCs into revealing secret information.
Contribution
It introduces a novel security concern for LLM-based NPCs and demonstrates how adversarial prompts can manipulate NPCs into revealing hidden secrets.
Findings
Adversarial prompts can successfully induce NPCs to disclose secrets.
Security vulnerabilities in LLM-based NPCs are feasible and pose risks.
Highlights the need for safeguards against prompt injection attacks.
Abstract
Large Language Models (LLMs) are increasingly used to generate dynamic dialogue for game NPCs. However, their integration raises new security concerns. In this study, we examine whether adversarial prompt injection can cause LLM-based NPCs to reveal hidden background secrets that are meant to remain undisclosed.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
