Complexity analysis and practical resolution of the data classification problem with private characteristics
David Pantoja, Ismael Rodriguez, Fernando Rubio, Clara Segura

TL;DR
This paper analyzes the challenge of extracting relevant information from individuals while preserving privacy, proving its NP-completeness and proposing heuristic algorithms, including genetic algorithms, for practical solutions.
Contribution
It introduces a formal framework for privacy-preserving data classification, proves NP-completeness, and develops heuristic algorithms, notably genetic algorithms, for effective problem resolution.
Findings
NP-completeness proven via reduction from Set Cover
Genetic algorithms outperform other heuristics in experiments
Hybrid greedy-genetic approach yields best results
Abstract
In this work we analyze the problem of, given the probability distribution of a population, questioning an unknown individual that is representative of the distribution so that our uncertainty about certain characteristics is significantly reduced -but the uncertainty about others, deemed private or sensitive, is not. Thus, the goal of the problem is extracting information being relevant to a legitimate purpose while preserving the privacy of individuals, which is crucial to enable non-intrusive selection processes in several areas. For instance, it is essential in the design of non-discriminatory personnel selection, promotion, and layoff processes in companies and institutions; in the retrieval of customer information being relevant to the service provided by a company (and no more); in certifications not revealing sensitive industrial information being irrelevant for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGame Theory and Voting Systems · Imbalanced Data Classification Techniques · Data Mining Algorithms and Applications
