GenAIPABench: A Benchmark for Generative AI-based Privacy Assistants

Aamir Hamid; Hemanth Reddy Samidi; Tim Finin; Primal Pappachan,; Roberto Yus

arXiv:2309.05138·cs.CR·December 20, 2023

GenAIPABench: A Benchmark for Generative AI-based Privacy Assistants

Aamir Hamid, Hemanth Reddy Samidi, Tim Finin, Primal Pappachan,, Roberto Yus

PDF

TL;DR

This paper introduces GenAIPABench, a comprehensive benchmark for evaluating the effectiveness and reliability of generative AI-based privacy assistants in understanding and responding to privacy policy questions.

Contribution

It presents a new benchmark with questions, metrics, and tools to assess genAI privacy assistants, and evaluates leading systems to identify strengths and challenges.

Findings

01

GenAI systems show promise in privacy assistance.

02

Challenges remain in response accuracy and consistency.

03

Evaluation highlights areas for improvement in genAI privacy tools.

Abstract

Privacy policies of websites are often lengthy and intricate. Privacy assistants assist in simplifying policies and making them more accessible and user friendly. The emergence of generative AI (genAI) offers new opportunities to build privacy assistants that can answer users questions about privacy policies. However, genAIs reliability is a concern due to its potential for producing inaccurate information. This study introduces GenAIPABench, a benchmark for evaluating Generative AI-based Privacy Assistants (GenAIPAs). GenAIPABench includes: 1) A set of questions about privacy policies and data protection regulations, with annotated answers for various organizations and regulations; 2) Metrics to assess the accuracy, relevance, and consistency of responses; and 3) A tool for generating prompts to introduce privacy documents and varied privacy questions to test system robustness. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.