A New Method for Avoiding Data Disclosure While Automatically Preserving   Multivariate Relations

Norman Matloff; Patrick Tendick

arXiv:1510.04406·stat.ME·November 3, 2015

A New Method for Avoiding Data Disclosure While Automatically Preserving Multivariate Relations

Norman Matloff, Patrick Tendick

PDF

Open Access

TL;DR

This paper introduces a novel statistical disclosure limitation method that preserves multivariate data structures, including mixed data types, ensuring data utility and privacy in statistical analyses.

Contribution

The paper presents a new SDL approach that automatically maintains the multivariate structure for mixed data types, addressing a key challenge in data privacy.

Findings

01

Method effectively preserves multivariate relationships.

02

Applicable to continuous, categorical, and mixed data.

03

Provides tools for data quality and risk assessment.

Abstract

Statistical disclosure limitation (SDL) methods aim to provide analysts general access to a data set while limiting the risk of disclosure of individual records. Many methods in the existing literature are aimed only at the case of univariate distributions, but the multivariate case is crucial, since most statistical analyses are multivariate in nature. Yet preserving the multivariate structure of the data can be challenging, especially when both continuous and categorical variables are present. Here we present a new SDL method that automatically attains the correct multivariate structure, regardless of whether the data are continuous, categorical or mixed. In addition, operational methods for assessing data quality and risk will be explored.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Privacy-Preserving Technologies in Data · Statistical Methods and Bayesian Inference