Reinforcing privacy reasoning in LLMs via normative simulacra from fiction
Matt Franchi, Madiha Zahrah Choksi, Harold Triedman, and Helen Nissenbaum

TL;DR
This paper introduces a novel approach to improve privacy reasoning in large language models by extracting normative norms from fiction and using reinforcement learning to align models with contextual privacy expectations.
Contribution
It proposes extracting structured normative simulacra from fiction novels and integrating them into LLM training via supervised learning and reinforcement learning, enhancing privacy reasoning.
Findings
GRPO with normative grounding outperforms other methods on law compliance benchmarks.
Fiction-derived norms improve models' recognition of privacy-relevant situations.
Models trained with normative simulacra better align with human privacy expectations.
Abstract
Information handling practices of LLM agents are broadly misaligned with the contextual privacy expectations of their users. Contextual Integrity (CI) provides a principled framework, defining privacy as the appropriate flow of information within context-relative norms. However, existing approaches either double inference cost via supervisor-assistant architectures, or fine-tune on narrow task-specific data. We propose extracting normative simulacra (structured representations of norms and information flows) from fiction novels and using them to fine-tune LLMs via supervised learning followed by GRPO reinforcement learning. Our composite reward function combines programmatic signals, including task clarity (subsuming schema validity, construct discrimination, and extraction confidence), structural completeness, internal consistency, and context identification, with an LLM judge that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
