Privacy-Preserving and Efficient Verification of the Outcome in Genome-Wide Association Studies
Anisa Halimi, Leonard Dervishi, Erman Ayday, Apostolos Pyrgelis, Juan, Ramon Troncoso-Pastoriza, Jean-Pierre Hubaux, Xiaoqian Jiang, Jaideep Vaidya

TL;DR
This paper introduces a framework for verifying the correctness of genome-wide association study results while preserving individual privacy through differential privacy techniques, ensuring reproducibility and auditability.
Contribution
The work presents a novel privacy-preserving verification framework that uses partial noisy datasets and public data to confirm the accuracy of GWAS results without compromising privacy.
Findings
High accuracy in verifying GWAS results with small variant statistics
Privacy leakage remains within acceptable bounds
Framework effective with real genomic data
Abstract
Providing provenance in scientific workflows is essential for reproducibility and auditability purposes. Workflow systems model and record provenance describing the steps performed to obtain the final results of a computation. In this work, we propose a framework that verifies the correctness of the statistical test results that are conducted by a researcher while protecting individuals' privacy in the researcher's dataset. The researcher publishes the workflow of the conducted study, its output, and associated metadata. They keep the research dataset private while providing, as part of the metadata, a partial noisy dataset (that achieves local differential privacy). To check the correctness of the workflow output, a verifier makes use of the workflow, its metadata, and results of another statistical study (using publicly available datasets) to distinguish between correct statistics and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Privacy-Preserving Technologies in Data · Cancer Genomics and Diagnostics
