The Evolution of User-Selected Passwords: A Quantitative Analysis of Publicly Available Datasets
Theodosis Mourouzis, Kyriacos E. Pavlou, Stylianos Kampakis

TL;DR
This study analyzes how user password choices have evolved over time using publicly available datasets, revealing improvements in password quality but also persistent bad practices.
Contribution
It provides a quantitative analysis of password evolution, highlighting trends and ongoing issues in user password selection over several years.
Findings
Recent datasets show passwords are less similar to bad passwords.
Password quality has improved in terms of length and character diversity.
Some discouraged practices like name inclusion still persist.
Abstract
The aim of this work is to study the evolution of password selection among users. We investigate whether users follow best practices when selecting passwords and identify areas in need of improvement. Four distinct publicly-available password datasets (obtained from security breaches, compiled by security experts, and designated as containing bad passwords) are employed. As these datasets were released at different times, the distributions characterizing these datasets suggest a chronological evolution of password selection. A similarity metric, Levenshtein distance, is used to compare passwords in each dataset against the designated benchmark of bad passwords. The resulting distributions of normalized similarity scores are then compared to each other. The comparison reveals an overall increase in the mean of the similarity distributions corresponding to more recent datasets,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsUser Authentication and Security Systems · Spam and Phishing Detection · Advanced Malware Detection Techniques
