Exploring the Impact of Password Dataset Distribution on Guessing
Hazel Murray, David Malone

TL;DR
This paper demonstrates that leaking a small subset of passwords from a dataset can enable attackers to guess a large portion of the remaining passwords by exploiting distributional similarities, highlighting risks in password data leaks.
Contribution
It introduces a theoretical and empirical framework to show how password distribution knowledge from a small sample can compromise the entire dataset, and proposes a measure for distribution similarity.
Findings
Sample leaks reveal distributional patterns that aid in guessing remaining passwords.
Distributional similarity allows better prediction of dataset passwords than unrelated samples.
Leaked small samples can significantly increase the risk of large-scale password compromise.
Abstract
Leaks from password datasets are a regular occurrence. An organization may defend a leak with reassurances that just a small subset of passwords were taken. In this paper we show that the leak of a relatively small number of text-based passwords from an organizations' stored dataset can lead to a further large collection of users being compromised. Taking a sample of passwords from a given dataset of passwords we exploit the knowledge we gain of the distribution to guess other samples from the same dataset. We show theoretically and empirically that the distribution of passwords in the sample follows the same distribution as the passwords in the whole dataset. We propose a function that measures the ability of one distribution to estimate another. Leveraging this we show that a sample of passwords leaked from a given dataset, will compromise the remaining passwords in that dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
