Marginality: a numerical mapping for enhanced treatment of nominal and hierarchical attributes
Josep Domingo-Ferrer

TL;DR
This paper introduces a numerical mapping for hierarchical nominal data to enable statistical analysis techniques like computing means and variances, improving data anonymization methods for categorical data.
Contribution
The paper proposes a novel numerical mapping for hierarchical nominal attributes, facilitating statistical analysis and enhancing privacy-preserving data publishing.
Findings
Enables calculation of means, variances, covariances for nominal data
Improves data anonymization techniques for categorical data
Expands options for statistical disclosure control methods
Abstract
The purpose of statistical disclosure control (SDC) of microdata, a.k.a. data anonymization or privacy-preserving data mining, is to publish data sets containing the answers of individual respondents in such a way that the respondents corresponding to the released records cannot be re-identified and the released data are analytically useful. SDC methods are either based on masking the original data, generating synthetic versions of them or creating hybrid versions by combining original and synthetic data. The choice of SDC methods for categorical data, especially nominal data, is much smaller than the choice of methods for numerical data. We mitigate this problem by introducing a numerical mapping for hierarchical nominal data which allows computing means, variances and covariances on them.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Economic and Environmental Valuation · Bayesian Modeling and Causal Inference
