TL;DR
This paper introduces SELFCI, a self-distillation framework that enhances large language models' ability to balance privacy and utility by independently optimizing for task performance and privacy constraints.
Contribution
SELFCI is a novel self-distillation approach that decouples information suppression from task resolution, improving privacy-utility trade-offs in LLMs without external supervision.
Findings
SELFCI outperforms baselines like GRPO in privacy-utility tasks.
It maintains performance in out-of-domain agentic workflows.
SELFCI aligns model behavior with privacy norms effectively.
Abstract
Contextual Integrity (CI) defines privacy not merely as keeping information hidden, but as governing information flows according to the norms of a given context. As large language models are increasingly deployed as personal agents handling sensitive workflows, adhering to CI becomes critical. However, even frontier models remain unreliable in making disclosure decisions, and existing mitigation strategies often degrade underlying task performance. To overcome this privacy-utility trade-off, we propose SELFCI, a complementary self-distillation framework that decouples information suppression from task resolution. SELFCI jointly optimizes two independent reverse KL divergences over distinct teacher distributions derived from feedback: one encourages preserving task-relevant information for utility, while the other enforces minimal and appropriate disclosure. This complementary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
