GDPR Auto-Formalization with AI Agents and Human Verification

Ha Thanh Nguyen; Wachara Fungwacharakorn; Sabine Wehnert; May Myo Zin; Yuntao Kong; Jieying Xue; Micha{\l} Araszkiewicz; Randy Goebel; Ken Satoh

arXiv:2604.14607·cs.AI·May 5, 2026

GDPR Auto-Formalization with AI Agents and Human Verification

Ha Thanh Nguyen, Wachara Fungwacharakorn, Sabine Wehnert, May Myo Zin, Yuntao Kong, Jieying Xue, Micha{\l} Araszkiewicz, Randy Goebel, Ken Satoh

PDF

1 Datasets

TL;DR

This paper presents a multi-agent, human-in-the-loop framework for automatic formalization of GDPR provisions using large language models, emphasizing verification for legal accuracy.

Contribution

It introduces a role-specialized, iterative AI-human workflow for GDPR formalization, creating a dataset and analyzing verification challenges.

Findings

01

Structured verification improves formalization reliability

02

Human oversight is crucial for legal nuance handling

03

Constructed a high-quality GDPR formalization dataset

Abstract

We study the overall process of automatic formalization of GDPR provisions using large language models, within a human-in-the-loop verification framework. Rather than aiming for full autonomy, we adopt a role-specialized workflow in which LLM-based AI components, operating in a multi-agent setting with iterative feedback, generate legal scenarios, formal rules, and atomic facts. This is coupled with independent verification modules which include human reviewers' assessment of representational, logical, and legal correctness. Using this approach, we construct a high-quality dataset to be used for GDPR auto-formalization, and analyze both successful and problematic cases. Our results show that structured verification and targeted human oversight are essential for reliable legal formalization, especially in the presence of legal nuance and context-sensitive reasoning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

nguyenthanhasia/gdpr-cases
dataset· 63 dl
63 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.