CI-Work: Benchmarking Contextual Integrity in Enterprise LLM Agents
Wenjie Fu, Xiaoting Qin, Jue Zhang, Qingwei Lin, Lukas Wutschitz, Robert Sim, Saravan Rajmohan, Dongmei Zhang

TL;DR
This paper introduces CI-Work, a benchmark for evaluating privacy risks in enterprise LLM agents, revealing prevalent privacy violations and the need for context-centric solutions.
Contribution
The paper presents a new benchmark for assessing privacy in enterprise LLMs and uncovers significant privacy leakage issues in current models.
Findings
Privacy violation rates range from 15.8% to 50.9%.
Higher task utility correlates with increased privacy violations.
Scaling models does not mitigate privacy risks.
Abstract
Enterprise LLM agents can dramatically improve workplace productivity, but their core capability, retrieving and using internal context to act on a user's behalf, also creates new risks for sensitive information leakage. We introduce CI-Work, a Contextual Integrity (CI)-grounded benchmark that simulates enterprise workflows across five information-flow directions and evaluates whether agents can convey essential content while withholding sensitive context in dense retrieval settings. Our evaluation of frontier models reveals that privacy failures are prevalent (violation rates range from 15.8%-50.9%, with leakage reaching up to 26.7%) and uncovers a counterintuitive trade-off critical for industrial deployment: higher task utility often correlates with increased privacy violations. Moreover, the massive scale of enterprise data and potential user behavior further amplify this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
