Beyond Refusal: Probing the Limits of Agentic Self-Correction for Semantic Sensitive Information

Umid Suleymanov; Zaur Rajabov; Emil Mirzazada; Murat Kantarcioglu

arXiv:2602.21496·cs.AI·February 26, 2026

Beyond Refusal: Probing the Limits of Agentic Self-Correction for Semantic Sensitive Information

Umid Suleymanov, Zaur Rajabov, Emil Mirzazada, Murat Kantarcioglu

PDF

Open Access

TL;DR

This paper introduces SemSIEdit, a novel inference-time framework that enables large language models to self-correct and rewrite sensitive information, balancing privacy and utility more effectively than simple refusal methods.

Contribution

The paper proposes SemSIEdit, an agentic rewriting approach that improves privacy protection while maintaining utility, and analyzes safety dynamics across different model scales.

Findings

01

Rewrites reduce sensitive information leakage by 34.6%.

02

Utility loss from rewriting is limited to 9.8%.

03

Large models expand safety through nuanced reasoning, smaller models tend to truncate.

Abstract

While defenses for structured PII are mature, Large Language Models (LLMs) pose a new threat: Semantic Sensitive Information (SemSI), where models infer sensitive identity attributes, generate reputation-harmful content, or hallucinate potentially wrong information. The capacity of LLMs to self-regulate these complex, context-dependent sensitive information leaks without destroying utility remains an open scientific question. To address this, we introduce SemSIEdit, an inference-time framework where an agentic "Editor" iteratively critiques and rewrites sensitive spans to preserve narrative flow rather than simply refusing to answer. Our analysis reveals a Privacy-Utility Pareto Frontier, where this agentic rewriting reduces leakage by 34.6% across all three SemSI categories while incurring a marginal utility loss of 9.8%. We also uncover a Scale-Dependent Safety Divergence: large…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Ethics and Social Impacts of AI · Artificial Intelligence in Healthcare and Education