The Extreme Risk of Personal Data Breaches & The Erosion of Privacy
Spencer Wheatley, Thomas Maillart, and Didier Sornette

TL;DR
This paper analyzes the increasing risk and scale of personal data breaches, revealing their heavy-tailed distribution, growth projections, and implications for privacy erosion due to cybercriminal activities.
Contribution
It provides a statistical model of breach sizes using a heavy-tailed distribution and projects future breach growth, highlighting the escalating privacy risks.
Findings
Maximum breach size around 200 million items
Breach sizes follow a heavy-tailed Pareto distribution
Total breached information expected to double in five years
Abstract
Personal data breaches from organisations, enabling mass identity fraud, constitute an \emph{extreme risk}. This risk worsens daily as an ever-growing amount of personal data are stored by organisations and on-line, and the attack surface surrounding this data becomes larger and harder to secure. Further, breached information is distributed and accumulates in the hands of cyber criminals, thus driving a cumulative erosion of privacy. Statistical modeling of breach data from 2000 through 2015 provides insights into this risk: A current maximum breach size of about 200 million is detected, and is expected to grow by fifty percent over the next five years. The breach sizes are found to be well modeled by an \emph{extremely heavy tailed} truncated Pareto distribution, with tail exponent parameter decreasing linearly from 0.57 in 2007 to 0.37 in 2015. With this current model, given a breach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
