Evaluating the Efficacy of AI Techniques in Textual Anonymization: A Comparative Study
Dimitris Asimopoulos, Ilias Siniosoglou, Vasileios Argyriou, Sotirios, K. Goudos, Konstantinos E. Psannis, Nikoleta Karditsioti, Theocharis, Saoulidis, Panagiotis Sarigiannidis

TL;DR
This study compares various AI models like CRF, LSTM, ELMo, and Transformers to evaluate their effectiveness in protecting privacy through textual anonymization, highlighting their strengths and potential for improved data privacy solutions.
Contribution
It provides a comprehensive comparative analysis of multiple AI techniques for text anonymization, emphasizing their unique advantages and potential for enhancing privacy protection.
Findings
CRF, LSTM, and ELMo outperform traditional methods.
Transformers offer a broader perspective on anonymization.
Models demonstrate varying strengths in dependency modeling and scalability.
Abstract
In the digital era, with escalating privacy concerns, it's imperative to devise robust strategies that protect private data while maintaining the intrinsic value of textual information. This research embarks on a comprehensive examination of text anonymisation methods, focusing on Conditional Random Fields (CRF), Long Short-Term Memory (LSTM), Embeddings from Language Models (ELMo), and the transformative capabilities of the Transformers architecture. Each model presents unique strengths since LSTM is modeling long-term dependencies, CRF captures dependencies among word sequences, ELMo delivers contextual word representations using deep bidirectional language models and Transformers introduce self-attention mechanisms that provide enhanced scalability. Our study is positioned as a comparative analysis of these models, emphasising their synergistic potential in addressing text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Topic Modeling · Authorship Attribution and Profiling
MethodsSigmoid Activation · Tanh Activation · Bidirectional LSTM · Long Short-Term Memory · Softmax · ELMo · Conditional Random Field
