Fixing Hardware Security Bugs with Large Language Models
Baleegh Ahmad, Shailja Thakur, Benjamin Tan, Ramesh Karri, Hammond, Pearce

TL;DR
This paper demonstrates that large language models can effectively repair hardware security bugs in Verilog code, outperforming existing tools through an ensemble approach and a novel evaluation framework.
Contribution
It introduces a framework for evaluating LLMs on hardware bug repair and shows that an ensemble of LLMs can outperform state-of-the-art tools.
Findings
Ensemble of LLMs repairs all benchmark bugs.
Outperforms Cirfix on hardware security bug repair.
Framework supports prompt engineering and parameter tuning.
Abstract
Novel AI-based code-writing Large Language Models (LLMs) such as OpenAI's Codex have demonstrated capabilities in many coding-adjacent domains. In this work we consider how LLMs maybe leveraged to automatically repair security relevant bugs present in hardware designs. We focus on bug repair in code written in the Hardware Description Language Verilog. For this study we build a corpus of domain-representative hardware security bugs. We then design and implement a framework to quantitatively evaluate the performance of any LLM tasked with fixing the specified bugs. The framework supports design space exploration of prompts (i.e., prompt engineering) and identifying the best parameters for the LLM. We show that an ensemble of LLMs can repair all ten of our benchmarks. This ensemble outperforms the state-of-the-art Cirfix hardware bug repair tool on its own suite of bugs. These results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Machine Learning in Materials Science · Adversarial Robustness in Machine Learning
