SNEAK: Evaluating Strategic Communication and Information Leakage in Large Language Models

Adar Avsian; Larry Heck

arXiv:2603.29846·cs.CL·April 1, 2026

SNEAK: Evaluating Strategic Communication and Information Leakage in Large Language Models

Adar Avsian, Larry Heck

PDF

1 Datasets

TL;DR

This paper introduces SNEAK, a benchmark for assessing how well large language models can communicate selectively, balancing informativeness and secrecy in multi-agent scenarios.

Contribution

The paper presents SNEAK, a novel benchmark for evaluating strategic communication and information leakage in language models, addressing a gap in existing evaluation methods.

Findings

01

Humans outperform all evaluated models by a large margin.

02

Current models struggle to balance informativeness and secrecy effectively.

03

Strategic communication remains a challenging capability for modern language models.

Abstract

Large language models (LLMs) are increasingly deployed in multi-agent settings where communication must balance informativeness and secrecy. In such settings, an agent may need to signal information to collaborators while preventing an adversary from inferring sensitive details. However, existing LLM benchmarks primarily evaluate capabilities such as reasoning, factual knowledge, or instruction following, and do not directly measure strategic communication under asymmetric information. We introduce SNEAK (Secret-aware Natural language Evaluation for Adversarial Knowledge), a benchmark for evaluating selective information sharing in language models. In SNEAK, a model is given a semantic category, a candidate set of words, and a secret word, and must generate a message that indicates knowledge of the secret without revealing it too clearly. We evaluate generated messages using two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

avalab/sneak
dataset· 28 dl
28 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.