HardSecBench: Benchmarking the Security Awareness of LLMs for Hardware Code Generation

Qirui Chen; Jingxian Shuai; Shuangwu Chen; Shenghao Ye; Zijian Wen; Xufei Su; Jie Jin; Jiangming Li; Jun Chen; Xiaobin Tan; and Jian Yang

arXiv:2601.13864·cs.CR·January 21, 2026

HardSecBench: Benchmarking the Security Awareness of LLMs for Hardware Code Generation

Qirui Chen, Jingxian Shuai, Shuangwu Chen, Shenghao Ye, Zijian Wen, Xufei Su, Jie Jin, Jiangming Li, Jun Chen, Xiaobin Tan, and Jian Yang

PDF

Open Access

TL;DR

HardSecBench is a comprehensive benchmark designed to evaluate the security awareness of large language models in hardware and firmware code generation, revealing that models often overlook security flaws despite functional correctness.

Contribution

This work introduces HardSecBench, a novel benchmark with 924 tasks covering hardware and firmware security, and proposes a multi-agent pipeline for reliable security evaluation of LLM-generated code.

Findings

01

LLMs often satisfy functional requirements but overlook security issues

02

Security evaluation results vary significantly with different prompts

03

HardSecBench enables systematic assessment of security risks in hardware code generation

Abstract

Large language models (LLMs) are being increasingly integrated into practical hardware and firmware development pipelines for code generation. Existing studies have primarily focused on evaluating the functional correctness of LLM-generated code, yet paid limited attention to its security issues. However, LLM-generated code that appears functionally sound may embed security flaws which could induce catastrophic damages after deployment. This critical research gap motivates us to design a benchmark for assessing security awareness under realistic specifications. In this work, we introduce HardSecBench, a benchmark with 924 tasks spanning Verilog Register Transfer Level (RTL) and firmware-level C, covering 76 hardware-relevant Common Weakness Enumeration (CWE) entries. Each task includes a structured specification, a secure reference implementation, and executable tests. To automate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Security and Verification in Computing · Adversarial Robustness in Machine Learning