An Open Source Python Library for Anonymizing Sensitive Data

Judith S\'ainz-Pardo D\'iaz; \'Alvaro L\'opez Garc\'ia

arXiv:2408.10766·cs.CR·August 21, 2024

An Open Source Python Library for Anonymizing Sensitive Data

Judith S\'ainz-Pardo D\'iaz, \'Alvaro L\'opez Garc\'ia

PDF

Open Access 1 Repo

TL;DR

This paper introduces an open source Python library designed to help researchers anonymize sensitive tabular data efficiently, supporting various anonymization techniques to facilitate open science while complying with data protection regulations.

Contribution

It presents a comprehensive Python framework for data anonymization, integrating multiple methods and best practices for development and testing, to aid open data sharing securely.

Findings

01

Provides a versatile set of anonymization methods

02

Supports compliance with data protection regulations

03

Ensures reliable and tested implementation

Abstract

Open science is a fundamental pillar to promote scientific progress and collaboration, based on the principles of open data, open source and open access. However, the requirements for publishing and sharing open data are in many cases difficult to meet in compliance with strict data protection regulations. Consequently, researchers need to rely on proven methods that allow them to anonymize their data without sharing it with third parties. To this end, this paper presents the implementation of a Python library for the anonymization of sensitive tabular data. This framework provides users with a wide range of anonymization methods that can be applied on the given dataset, including the set of identifiers, quasi-identifiers, generalization hierarchies and allowed level of suppression, along with the sensitive attribute and the level of anonymity required. The library has been implemented…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ifca-advanced-computing/anjana
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data