StealthGraph: Exposing Domain-Specific Risks in LLMs through Knowledge-Graph-Guided Harmful Prompt Generation

Huawei Zheng; Xinqi Jiang; Sen Yang; Shouling Ji; Yingcai Wu; Dazhen Deng

arXiv:2601.04740·cs.CL·April 21, 2026

StealthGraph: Exposing Domain-Specific Risks in LLMs through Knowledge-Graph-Guided Harmful Prompt Generation

Huawei Zheng, Xinqi Jiang, Sen Yang, Shouling Ji, Yingcai Wu, Dazhen Deng

PDF

1 Repo

TL;DR

StealthGraph introduces a framework that generates domain-specific, implicit harmful prompts for LLM safety testing using knowledge graphs and obfuscation techniques, enhancing realism in safety evaluations.

Contribution

The paper presents a novel end-to-end method combining knowledge-graph-guided prompt generation and obfuscation rewriting to produce realistic, domain-specific harmful prompts.

Findings

01

Generated datasets are highly domain-relevant and implicit.

02

The approach improves the realism of red-teaming for LLM safety.

03

Code and datasets are publicly available on GitHub.

Abstract

Large language models (LLMs) are increasingly applied in specialized domains such as finance and healthcare, where they introduce unique safety risks. Domain-specific datasets of harmful prompts remain scarce and still largely rely on manual construction; public datasets mainly focus on explicit harmful prompts, which modern LLM defenses can often detect and refuse. In contrast, implicit harmful prompts-expressed through indirect domain knowledge-are harder to detect and better reflect real-world threats. We identify two challenges: transforming domain knowledge into actionable constraints and increasing the implicitness of generated harmful prompts. To address them, we propose an end-to-end framework that first performs knowledge-graph-guided harmful prompt generation to systematically produce domain-relevant prompts, and then applies two-strategy obfuscation rewriting to convert…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.