CIPHER: Cryptographic Insecurity Profiling via Hybrid Evaluation of Responses

Max Manolov; Tony Gao; Siddharth Shukla; Cheng-Ting Chou; and Ryan Lagasse

arXiv:2602.01438·cs.CR·February 9, 2026

CIPHER: Cryptographic Insecurity Profiling via Hybrid Evaluation of Responses

Max Manolov, Tony Gao, Siddharth Shukla, Cheng-Ting Chou, and Ryan Lagasse

PDF

Open Access

TL;DR

CIPHER is a benchmark that evaluates cryptographic vulnerabilities in LLM-generated Python code, revealing that even explicit secure prompts do not fully eliminate security flaws.

Contribution

The paper introduces CIPHER, a novel benchmark with a cryptography-specific vulnerability taxonomy and automated scoring to assess LLM security in cryptographic code.

Findings

01

Secure prompting reduces some vulnerabilities

02

Vulnerabilities persist despite explicit security instructions

03

Benchmark and scoring pipeline will be publicly available

Abstract

Large language models (LLMs) are increasingly used to assist developers with code, yet their implementations of cryptographic functionality often contain exploitable flaws. Minor design choices (e.g., static initialization vectors or missing authentication) can silently invalidate security guarantees. We introduce CIPHER(Cryptographic Insecurity Profiling via Hybrid Evaluation of Responses), a benchmark for measuring cryptographic vulnerability incidence in LLM-generated Python code under controlled security-guidance conditions. CIPHER uses insecure/neutral/secure prompt variants per task, a cryptography-specific vulnerability taxonomy, and line-level attribution via an automated scoring pipeline. Across a diverse set of widely used LLMs, we find that explicit secure prompting reduces some targeted issues but does not reliably eliminate cryptographic vulnerabilities overall. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Adversarial Robustness in Machine Learning · Security and Verification in Computing