On the Approximability of Geometric and Geographic Generalization and the Min-Max Bin Covering Problem
Wenliang Du, David Eppstein, Michael T. Goodrich, George S. Lueker

TL;DR
This paper investigates the limits of approximating data generalization for privacy, proving hardness results and providing algorithms that perform near optimal, with a novel connection to the min-max bin covering problem.
Contribution
It establishes the computational hardness of approximating certain data generalization problems and introduces approximation algorithms, linking the problem to a new bin packing variant.
Findings
Hardness results for approximating generalization with geographic or unordered attributes
Approximation algorithms that perform close to optimal in practice
Introduction of the min-max bin covering problem as a new related challenge
Abstract
We study the problem of abstracting a table of data about individuals so that no selection query can identify fewer than k individuals. We show that it is impossible to achieve arbitrarily good polynomial-time approximations for a number of natural variations of the generalization technique, unless P = NP, even when the table has only a single quasi-identifying attribute that represents a geographic or unordered attribute: Zip-codes: nodes of a planar graph generalized into connected subgraphs GPS coordinates: points in R2 generalized into non-overlapping rectangles Unordered data: text labels that can be grouped arbitrarily. In addition to impossibility results, we provide approximation algorithms for these difficult single-attribute generalization problems, which, of course, apply to multiple-attribute instances with one that is quasi-identifying. We show theoretically and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Complexity and Algorithms in Graphs · Optimization and Search Problems
