Local Language Models for Context-Aware Adaptive Anonymization of Sensitive Text
Aisvarya Adeseye, Jouni Isoaho, Seppo Virtanen, Mohammad Tahir

TL;DR
This paper presents a context-aware, adaptive anonymization framework using local language models to improve privacy in qualitative research data, outperforming manual and rule-based methods.
Contribution
It introduces the Structured Framework for Adaptive Anonymizer (SFAA) utilizing local LLMs for sensitive data detection and anonymization, enhancing accuracy and consistency.
Findings
Phi model detected over 91% of sensitive data
Phi maintained 94.8% sentiment consistency
LLMs identified more sensitive data than human reviewers
Abstract
Qualitative research often contains personal, contextual, and organizational details that pose privacy risks if not handled appropriately. Manual anonymization is time-consuming, inconsistent, and frequently omits critical identifiers. Existing automated tools tend to rely on pattern matching or fixed rules, which fail to capture context and may alter the meaning of the data. This study uses local LLMs to build a reliable, repeatable, and context-aware anonymization process for detecting and anonymizing sensitive data in qualitative transcripts. We introduce a Structured Framework for Adaptive Anonymizer (SFAA) that includes three steps: detection, classification, and adaptive anonymization. The SFAA incorporates four anonymization strategies: rule-based substitution, context-aware rewriting, generalization, and suppression. These strategies are applied based on the identifier type and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFocus Groups and Qualitative Methods · Computational and Text Analysis Methods · Qualitative Research Methods and Applications
