MG-Verilog: Multi-grained Dataset Towards Enhanced LLM-assisted Verilog Generation
Yongan Zhang, Zhongzhi Yu, Yonggan Fu, Cheng Wan, Yingyan Celine Lin

TL;DR
This paper introduces MG-Verilog, a multi-grained dataset designed to improve LLM-assisted hardware design by providing diverse, detailed descriptions and code samples, along with a fine-tuning scheme that enhances model performance.
Contribution
The paper presents a novel multi-grained Verilog dataset and an open-source infrastructure, along with a balanced fine-tuning scheme to better leverage diverse hardware data for LLMs.
Findings
Enhanced LLM performance in hardware design tasks.
The dataset supports various levels of detail for flexible training.
Fine-tuning with MG-Verilog improves model accuracy.
Abstract
Large Language Models (LLMs) have recently shown promise in streamlining hardware design processes by encapsulating vast amounts of domain-specific data. In addition, they allow users to interact with the design processes through natural language instructions, thus making hardware design more accessible to developers. However, effectively leveraging LLMs in hardware design necessitates providing domain-specific data during inference (e.g., through in-context learning), fine-tuning, or pre-training. Unfortunately, existing publicly available hardware datasets are often limited in size, complexity, or detail, which hinders the effectiveness of LLMs in hardware design tasks. To address this issue, we first propose a set of criteria for creating high-quality hardware datasets that can effectively enhance LLM-assisted hardware design. Based on these criteria, we propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsSparse Evolutionary Training
