It Takes Two: Complementary Self-Distillation for Contextual Integrity in LLMs

Sangwoo Park; Woongyeong Yeo; Seanie Lee; Yumin Choi; Hyomin Lee; Kangsan Kim; Jinheon Baek; Seong Joon Oh; Sung Ju Hwang

arXiv:2605.20258·cs.LG·May 21, 2026

It Takes Two: Complementary Self-Distillation for Contextual Integrity in LLMs

Sangwoo Park, Woongyeong Yeo, Seanie Lee, Yumin Choi, Hyomin Lee, Kangsan Kim, Jinheon Baek, Seong Joon Oh, Sung Ju Hwang

PDF

1 Repo

TL;DR

This paper introduces SELFCI, a self-distillation framework that enhances large language models' ability to balance privacy and utility by independently optimizing for task performance and privacy constraints.

Contribution

SELFCI is a novel self-distillation approach that decouples information suppression from task resolution, improving privacy-utility trade-offs in LLMs without external supervision.

Findings

01

SELFCI outperforms baselines like GRPO in privacy-utility tasks.

02

It maintains performance in out-of-domain agentic workflows.

03

SELFCI aligns model behavior with privacy norms effectively.

Abstract

Contextual Integrity (CI) defines privacy not merely as keeping information hidden, but as governing information flows according to the norms of a given context. As large language models are increasingly deployed as personal agents handling sensitive workflows, adhering to CI becomes critical. However, even frontier models remain unreliable in making disclosure decisions, and existing mitigation strategies often degrade underlying task performance. To overcome this privacy-utility trade-off, we propose SELFCI, a complementary self-distillation framework that decouples information suppression from task resolution. SELFCI jointly optimizes two independent reverse KL divergences over distinct teacher distributions derived from feedback: one encourages preserving task-relevant information for utility, while the other enforces minimal and appropriate disclosure. This complementary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sw-programmer/SelfCI
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.