When Intelligence Fails: An Empirical Study on Why LLMs Struggle with Password Cracking

Mohammad Abdul Rehman; Syed Imad Ali Shah; Abbas Anwar; Noor Islam; Hamid Khan

arXiv:2510.17884·cs.CR·January 1, 2026

When Intelligence Fails: An Empirical Study on Why LLMs Struggle with Password Cracking

Mohammad Abdul Rehman, Syed Imad Ali Shah, Abbas Anwar, Noor Islam, Hamid Khan

PDF

TL;DR

This study empirically evaluates the ability of large language models to crack passwords and finds they perform poorly compared to traditional methods, highlighting their limitations in domain-specific security tasks.

Contribution

It provides a comprehensive empirical analysis of LLMs' performance in password cracking, revealing their current limitations and the need for domain-specific fine-tuning.

Findings

01

LLMs achieve less than 1.5% accuracy at Hit@10 in password guessing.

02

Traditional rule-based methods outperform LLMs significantly.

03

LLMs lack effective domain adaptation and memorization for password inference.

Abstract

The remarkable capabilities of Large Language Models (LLMs) in natural language understanding and generation have sparked interest in their potential for cybersecurity applications, including password guessing. In this study, we conduct an empirical investigation into the efficacy of pre-trained LLMs for password cracking using synthetic user profiles. Specifically, we evaluate the performance of state-of-the-art open-source LLMs such as TinyLLaMA, Falcon-RW-1B, and Flan-T5 by prompting them to generate plausible passwords based on structured user attributes (e.g., name, birthdate, hobbies). Our results, measured using Hit@1, Hit@5, and Hit@10 metrics under both plaintext and SHA-256 hash comparisons, reveal consistently poor performance, with all models achieving less than 1.5% accuracy at Hit@10. In contrast, traditional rule-based and combinator-based cracking methods demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.