MG-Verilog: Multi-grained Dataset Towards Enhanced LLM-assisted Verilog   Generation

Yongan Zhang; Zhongzhi Yu; Yonggan Fu; Cheng Wan; Yingyan Celine Lin

arXiv:2407.01910·cs.LG·July 4, 2024·1 cites

MG-Verilog: Multi-grained Dataset Towards Enhanced LLM-assisted Verilog Generation

Yongan Zhang, Zhongzhi Yu, Yonggan Fu, Cheng Wan, Yingyan Celine Lin

PDF

Open Access 1 Repo

TL;DR

This paper introduces MG-Verilog, a multi-grained dataset designed to improve LLM-assisted hardware design by providing diverse, detailed descriptions and code samples, along with a fine-tuning scheme that enhances model performance.

Contribution

The paper presents a novel multi-grained Verilog dataset and an open-source infrastructure, along with a balanced fine-tuning scheme to better leverage diverse hardware data for LLMs.

Findings

01

Enhanced LLM performance in hardware design tasks.

02

The dataset supports various levels of detail for flexible training.

03

Fine-tuning with MG-Verilog improves model accuracy.

Abstract

Large Language Models (LLMs) have recently shown promise in streamlining hardware design processes by encapsulating vast amounts of domain-specific data. In addition, they allow users to interact with the design processes through natural language instructions, thus making hardware design more accessible to developers. However, effectively leveraging LLMs in hardware design necessitates providing domain-specific data during inference (e.g., through in-context learning), fine-tuning, or pre-training. Unfortunately, existing publicly available hardware datasets are often limited in size, complexity, or detail, which hinders the effectiveness of LLMs in hardware design tasks. To address this issue, we first propose a set of criteria for creating high-quality hardware datasets that can effectively enhance LLM-assisted hardware design. Based on these criteria, we propose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

luke-avionics/mg-verilog
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsSparse Evolutionary Training