Tricking LLM-Based NPCs into Spilling Secrets

Kyohei Shiomi; Zhuotao Lian; Toru Nakanishi; Teruaki Kitasuka

arXiv:2508.19288·cs.CR·August 28, 2025

Tricking LLM-Based NPCs into Spilling Secrets

Kyohei Shiomi, Zhuotao Lian, Toru Nakanishi, Teruaki Kitasuka

PDF

TL;DR

This paper investigates the security risks of using LLMs for game NPC dialogue by demonstrating how adversarial prompts can trick NPCs into revealing secret information.

Contribution

It introduces a novel security concern for LLM-based NPCs and demonstrates how adversarial prompts can manipulate NPCs into revealing hidden secrets.

Findings

01

Adversarial prompts can successfully induce NPCs to disclose secrets.

02

Security vulnerabilities in LLM-based NPCs are feasible and pose risks.

03

Highlights the need for safeguards against prompt injection attacks.

Abstract

Large Language Models (LLMs) are increasingly used to generate dynamic dialogue for game NPCs. However, their integration raises new security concerns. In this study, we examine whether adversarial prompt injection can cause LLM-based NPCs to reveal hidden background secrets that are meant to remain undisclosed.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.