Extractive text summarisation of Privacy Policy documents using machine   learning approaches

Chanwoo Choi

arXiv:2404.08686·cs.CL·April 16, 2024·2 cites

Extractive text summarisation of Privacy Policy documents using machine learning approaches

Chanwoo Choi

PDF

Open Access

TL;DR

This paper presents two clustering-based models for summarizing Privacy Policy documents, with the PDC approach outperforming K-means in extracting GDPR-relevant sentences.

Contribution

Introduces a novel PDC clustering-based summarization method tailored for GDPR compliance in privacy policies, outperforming traditional K-means clustering.

Findings

01

PDC model outperforms K-means in ROUGE and SSD metrics

02

PDC effectively segregates sentences based on GDPR topics

03

Task-specific fine-tuning improves unsupervised summarization results

Abstract

This work demonstrates two Privacy Policy (PP) summarisation models based on two different clustering algorithms: K-means clustering and Pre-determined Centroid (PDC) clustering. K-means is decided to be used for the first model after an extensive evaluation of ten commonly used clustering algorithms. The summariser model based on the PDC-clustering algorithm summarises PP documents by segregating individual sentences by Euclidean distance from each sentence to the pre-defined cluster centres. The cluster centres are defined according to General Data Protection Regulation (GDPR)'s 14 essential topics that must be included in any privacy notices. The PDC model outperformed the K-means model for two evaluation methods, Sum of Squared Distance (SSD) and ROUGE by some margin (27% and 24% respectively). This result contrasts the K-means model's better performance in the general clustering of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Privacy, Security, and Data Protection · Access Control and Trust

Methodsk-Means Clustering · Prime Dilated Convolution