rx-anon -- A Novel Approach on the De-Identification of Heterogeneous Data based on a Modified Mondrian Algorithm
Fabian Singhofer, Aygul Garifullina, Mathias Kern, Ansgar Scherp

TL;DR
rx-anon is a framework for anonymizing heterogeneous semi-structured data, combining relational and textual attributes using a modified Mondrian algorithm to balance privacy and information retention.
Contribution
It introduces a novel framework that integrates textual and relational data anonymization with a parameterized Mondrian algorithm for flexible privacy control.
Findings
Reduces information loss through parameter tuning.
Guarantees k-anonymity for both data types.
Effective on real-world datasets with measurable privacy preservation.
Abstract
Traditional approaches for data anonymization consider relational data and textual data independently. We propose rx-anon, an anonymization approach for heterogeneous semi-structured documents composed of relational and textual attributes. We map sensitive terms extracted from the text to the structured data. This allows us to use concepts like k-anonymity to generate a joined, privacy-preserved version of the heterogeneous data input. We introduce the concept of redundant sensitive information to consistently anonymize the heterogeneous data. To control the influence of anonymization over unstructured textual data versus structured data attributes, we introduce a modified, parameterized Mondrian algorithm. The parameter allows to give different weight on the relational and textual attributes during the anonymization process. We evaluate our approach with two real-world…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Internet Traffic Analysis and Secure E-voting · Cryptography and Data Security
