Security awareness in LLM agents: the NDAI zone case
Enrico Bottazzi, Pia Park

TL;DR
This paper investigates how different large language models assess security evidence in NDAI zones, revealing they detect danger but struggle to verify safety, which is crucial for privacy-preserving negotiations.
Contribution
It provides an empirical analysis of LLMs' responses to security evidence in NDAI-style negotiations, highlighting their limitations in safety verification.
Findings
Failing attestation suppresses disclosure across models
Passing attestation leads to heterogeneous responses
Current LLMs reliably detect danger but not safety
Abstract
NDAI zones let inventor and investor agents negotiate inside a Trusted Execution Environment (TEE) where any disclosed information is deleted if no deal is reached. This makes full IP disclosure the rational strategy for the inventor's agent. Leveraging this infrastructure, however, requires agents to distinguish a secure environment from an insecure one, a capability LLM agents lack natively, since they can rely only on evidence passed through the context window to form awareness of their execution environment. We ask: How do different LLM models weight various forms of evidence when forming awareness of the security of their execution environment? Using an NDAI-style negotiation task across 10 language models and various evidence scenarios, we find a clear asymmetry: a failing attestation universally suppresses disclosure across all models, whereas a passing attestation produces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAccess Control and Trust · Multi-Agent Systems and Negotiation · Mobile Agent-Based Network Management
