TL;DR
This paper introduces a reinforcement learning approach to improve the reasoning, safety, and privacy compliance of large language models by aligning them with legal standards like GDPR, EU AI Act, and HIPAA.
Contribution
It formulates safety and privacy issues as contextualized compliance problems and employs RL with rule-based rewards to enhance reasoning and normative compliance in LLMs.
Findings
Achieved +8.58% accuracy in safety/privacy benchmarks.
Improved reasoning accuracy by +2.05% on MMLU.
Enhanced legal compliance and reasoning capabilities.
Abstract
While Large Language Models (LLMs) exhibit remarkable capabilities, they also introduce significant safety and privacy risks. Current mitigation strategies often fail to preserve contextual reasoning capabilities in risky scenarios. Instead, they rely heavily on sensitive pattern matching to protect LLMs, which limits the scope. Furthermore, they overlook established safety and privacy standards, leading to systemic risks for legal compliance. To address these gaps, we formulate safety and privacy issues into contextualized compliance problems following the Contextual Integrity (CI) theory. Under the CI framework, we align our model with three critical regulatory standards: GDPR, EU AI Act, and HIPAA. Specifically, we employ reinforcement learning (RL) with a rule-based reward to incentivize contextual reasoning capabilities while enhancing compliance with safety and privacy norms.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
MethodsBalanced Selection · ALIGN
