MPCI-Bench: A Benchmark for Multimodal Pairwise Contextual Integrity Evaluation of Language Model Agents

Shouju Wang; Haopeng Zhang

arXiv:2601.08235·cs.AI·January 27, 2026

MPCI-Bench: A Benchmark for Multimodal Pairwise Contextual Integrity Evaluation of Language Model Agents

Shouju Wang, Haopeng Zhang

PDF

Open Access

TL;DR

MPCI-Bench is a novel multimodal benchmark designed to evaluate how well language model agents adhere to social norms of privacy across visual and textual data, addressing gaps in existing text-centric CI assessments.

Contribution

It introduces the first multimodal pairwise CI benchmark with a comprehensive evaluation pipeline and reveals systematic privacy-utility trade-offs in current models.

Findings

01

State-of-the-art models fail to balance privacy and utility.

02

Visual modality leaks more sensitive information than text.

03

Benchmark will be open-sourced for future research.

Abstract

As language-model agents evolve from passive chatbots into proactive assistants that handle personal data, evaluating their adherence to social norms becomes increasingly critical, often through the lens of Contextual Integrity (CI). However, existing CI benchmarks are largely text-centric and primarily emphasize negative refusal scenarios, overlooking multimodal privacy risks and the fundamental trade-off between privacy and utility. In this paper, we introduce MPCI-Bench, the first Multimodal Pairwise Contextual Integrity benchmark for evaluating privacy behavior in agentic settings. MPCI-Bench consists of paired positive and negative instances derived from the same visual source and instantiated across three tiers: normative Seed judgments, context-rich Story reasoning, and executable agent action Traces. Data quality is ensured through a Tri-Principle Iterative Refinement pipeline.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education