Building cross-language corpora for human understanding of privacy policies
Francesco Ciclosi, Silvia Vidor, and Fabio Massacci

TL;DR
This paper presents a methodology for creating comparable cross-language privacy policy corpora to improve understanding across different languages, demonstrated through an English-Italian comparison.
Contribution
It introduces a novel methodology for building cross-language privacy policy corpora and applies it to English and Italian, facilitating multilingual user understanding studies.
Findings
Extended privacy policy corpus for English and Italian
Identified challenges in replicating privacy understanding studies across languages
Provided a framework for cross-language corpus construction
Abstract
Making sure that users understand privacy policies that impact them is a key challenge for a real GDPR deployment. Research studies are mostly carried in English, but in Europe and elsewhere, users speak a language that is not English. Replicating studies in different languages requires the availability of comparable cross-language privacy policies corpora. This work provides a methodology for building comparable cross-language in a national language and a reference study language. We provide an application example of our methodology comparing English and Italian extending the corpus of one of the first studies about users understanding of technical terms in privacy policies. We also investigate other open issues that can make replication harder.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy, Security, and Data Protection · Ethics and Social Impacts of AI · Privacy-Preserving Technologies in Data
