Comparing Feature-based and Context-aware Approaches to PII Generalization Level Prediction
Kailin Zhang, Xinying Qiu

TL;DR
This paper compares feature-based and context-aware methods for predicting the generalization level of PII in text, demonstrating the superiority of context-aware approaches using multilingual BERT for enhanced privacy protection.
Contribution
It introduces a novel context-aware framework utilizing multilingual BERT and semantic analysis, advancing PII generalization by incorporating broader context and relationships.
Findings
Context-aware approach outperforms feature-based method on WikiReplace dataset.
Multilingual BERT effectively captures semantic relationships for PII generalization.
Incorporating context improves privacy protection in text anonymization.
Abstract
Protecting Personal Identifiable Information (PII) in text data is crucial for privacy, but current PII generalization methods face challenges such as uneven data distributions and limited context awareness. To address these issues, we propose two approaches: a feature-based method using machine learning to improve performance on structured inputs, and a novel context-aware framework that considers the broader context and semantic relationships between the original text and generalized candidates. The context-aware approach employs Multilingual-BERT for text representation, functional transformations, and mean squared error scoring to evaluate candidates. Experiments on the WikiReplace dataset demonstrate the effectiveness of both methods, with the context-aware approach outperforming the feature-based one across different scales. This work contributes to advancing PII generalization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Neural Networks and Applications
