The $k$-anonymity Problem is Hard
Paola Bonizzoni, Gianluca Della Vedova, Riccardo Dondi

TL;DR
This paper proves that the k-anonymity problem, which involves clustering data to protect privacy with minimal suppression, remains computationally hard even under simplified conditions, highlighting its intrinsic complexity.
Contribution
It establishes APX-hardness for specific restricted cases of the k-anonymity problem, demonstrating its computational difficulty beyond previously known NP-hardness.
Findings
k-anonymity problem is APX-hard for binary alphabet and k=3
k-anonymity problem is APX-hard for record length at most 8 and k=4
The problem remains hard even with simplified data restrictions.
Abstract
The problem of publishing personal data without giving up privacy is becoming increasingly important. An interesting formalization recently proposed is the k-anonymity. This approach requires that the rows in a table are clustered in sets of size at least k and that all the rows in a cluster become the same tuple, after the suppression of some records. The natural optimization problem, where the goal is to minimize the number of suppressed entries, is known to be NP-hard when the values are over a ternary alphabet, k = 3 and the rows length is unbounded. In this paper we give a lower bound on the approximation factor that any polynomial-time algorithm can achive on two restrictions of the problem,namely (i) when the records values are over a binary alphabet and k = 3, and (ii) when the records have length at most 8 and k = 4, showing that these restrictions of the problem are APX-hard.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Internet Traffic Analysis and Secure E-voting
