HardSecBench: Benchmarking the Security Awareness of LLMs for Hardware Code Generation
Qirui Chen, Jingxian Shuai, Shuangwu Chen, Shenghao Ye, Zijian Wen, Xufei Su, Jie Jin, Jiangming Li, Jun Chen, Xiaobin Tan, and Jian Yang

TL;DR
HardSecBench is a comprehensive benchmark designed to evaluate the security awareness of large language models in hardware and firmware code generation, revealing that models often overlook security flaws despite functional correctness.
Contribution
This work introduces HardSecBench, a novel benchmark with 924 tasks covering hardware and firmware security, and proposes a multi-agent pipeline for reliable security evaluation of LLM-generated code.
Findings
LLMs often satisfy functional requirements but overlook security issues
Security evaluation results vary significantly with different prompts
HardSecBench enables systematic assessment of security risks in hardware code generation
Abstract
Large language models (LLMs) are being increasingly integrated into practical hardware and firmware development pipelines for code generation. Existing studies have primarily focused on evaluating the functional correctness of LLM-generated code, yet paid limited attention to its security issues. However, LLM-generated code that appears functionally sound may embed security flaws which could induce catastrophic damages after deployment. This critical research gap motivates us to design a benchmark for assessing security awareness under realistic specifications. In this work, we introduce HardSecBench, a benchmark with 924 tasks spanning Verilog Register Transfer Level (RTL) and firmware-level C, covering 76 hardware-relevant Common Weakness Enumeration (CWE) entries. Each task includes a structured specification, a secure reference implementation, and executable tests. To automate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Security and Verification in Computing · Adversarial Robustness in Machine Learning
