Private DNA Sequencing: Hiding Information in Discrete Noise
Kayvon Mazooji, Roy Dong, Ilan Shomorony

TL;DR
This paper explores a privacy-preserving method for DNA sequencing by mixing known DNA samples as noise, analyzing the optimal noise distribution to maximize privacy while allowing individuals to recover their own genetic information.
Contribution
It introduces a formal framework for hiding genetic information using additive discrete noise and derives bounds on the optimal noise distribution for privacy enhancement.
Findings
Bounds on the worst-case noise distribution are close.
A convex relaxation provides a closed-form lower bound.
A greedy algorithm approximates the upper bound effectively.
Abstract
When an individual's DNA is sequenced, sensitive medical information becomes available to the sequencing laboratory. A recently proposed way to hide an individual's genetic information is to mix in DNA samples of other individuals. We assume that the genetic content of these samples is known to the individual but unknown to the sequencing laboratory. Thus, these DNA samples act as "noise" to the sequencing laboratory, but still allow the individual to recover their own DNA samples afterward. Motivated by this idea, we study the problem of hiding a binary random variable (a genetic marker) with the additive noise provided by mixing DNA samples, using mutual information as a privacy metric. This is equivalent to the problem of finding a worst-case noise distribution for recovering from the noisy observation among a set of feasible discrete distributions. We characterize upper and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
