A Determination Scheme for Quasi-Identifiers Using Uniqueness and Influence for De-Identification of Clinical Data
Jipmin Jung, Phillip Park, Jaedong Lee, Hyein Lee, Geonkook Lee and, Hyosoung Cha

TL;DR
This paper proposes a systematic method for selecting quasi-identifiers in clinical data to enhance de-identification, balancing data utility and privacy protection based on uniqueness and influence measures.
Contribution
It introduces a novel multi-step approach for identifying and classifying quasi-identifiers tailored to user classes, improving data anonymization processes.
Findings
Final QI sets ranged from 18 to 28 attributes.
The method enables objective selection of QIs for secure data sharing.
Researchers can apply this method for effective de-identification of clinical data.
Abstract
Objectives; The accumulation and usefulness of clinical data have increased with IT development. While using clinical data that needs to be identifiable to obtain meaningful information, it is essential to ensure that data is de-identified and unnecessary clinical information is minimized to protect personal information. This process requires criteria and an appropriate method as there are clear identifiers as well as quasi-identifiers that are not readily identifiable. Methods; To formulate such a method, first, primary quasi-identifiers were selected by classifying information in 20 clinical personal information database tables into Direct-Identifier (DID), Quasi-Identifier (QI), Sensitive Attribute (SA), and Non-Sensitive Attribute (NSA) according to its type. Secondary QIs were then selected by assessing the risk for outliers by measuring uniqueness values of the selected data and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Artificial Intelligence in Healthcare and Education
