Is Your Model Sensitive? SPeDaC: A New Benchmark for Detecting and Classifying Sensitive Personal Data
Gaia Gambarelli, Aldo Gangemi, Rocco Tripodi

TL;DR
This paper introduces SPeDaC, a comprehensive benchmark dataset for detecting and classifying sensitive personal data in English, enabling better evaluation of models across multiple complexity levels.
Contribution
The paper presents SPeDaC, a new annotated dataset for sensitive data detection with three classification subtasks, filling a gap in benchmarking resources for this domain.
Findings
Transformer models outperform other classifiers.
Detection accuracy decreases with finer-grained classification.
SPeDaC is a challenging benchmark for current models.
Abstract
In recent years, there has been an exponential growth of applications, including dialogue systems, that handle sensitive personal information. This has brought to light the extremely important issue of personal data protection in virtual environments. Sensitive Information Detection (SID) approaches different domains and languages in literature. However, if we refer to the personal data domain, a shared benchmark or the absence of an available labeled resource makes comparison with the state-of-the-art difficult. We introduce and release SPeDaC , a new annotated resource for the identification of sensitive personal data categories in the English language. SPeDaC enables the evaluation of computational models for three different SID subtasks with increasing levels of complexity. SPeDaC 1 regards binary classification, a model has to detect if a sentence contains sensitive information or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy, Security, and Data Protection · Privacy-Preserving Technologies in Data · Digital and Cyber Forensics
MethodsMulti-Head Attention · Attention Is All You Need · How do I file a dispute with Expedia?*DisputeFastService · DeBERTa · Linear Layer · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · Softmax · Residual Connection
