OriGen:Enhancing RTL Code Generation with Code-to-Code Augmentation and   Self-Reflection

Fan Cui; Chenyang Yin; Kexing Zhou; Youwei Xiao; Guangyu Sun; Qiang; Xu; Qipeng Guo; Demin Song; Dahua Lin; Xingcheng Zhang; Yun (Eric) Liang

arXiv:2407.16237·cs.AR·September 4, 2024

OriGen:Enhancing RTL Code Generation with Code-to-Code Augmentation and Self-Reflection

Fan Cui, Chenyang Yin, Kexing Zhou, Youwei Xiao, Guangyu Sun, Qiang, Xu, Qipeng Guo, Demin Song, Dahua Lin, Xingcheng Zhang, Yun (Eric) Liang

PDF

1 Repo 2 Models 3 Datasets

TL;DR

OriGen is an open-source framework that improves RTL code generation by using code augmentation and self-reflection to correct errors, outperforming existing open-source models and even surpassing GPT-4 Turbo in key benchmarks.

Contribution

The paper introduces OriGen, a novel open-source RTL code generation framework with self-reflection and dataset augmentation, significantly enhancing performance over prior open-source models.

Findings

01

OriGen outperforms other open-source models by 12.8%.

02

OriGen exceeds GPT-4 Turbo in pass@1 on VerilogEval-Human.

03

OriGen improves self-reflection capabilities by 19.9%.

Abstract

Recent studies have demonstrated the significant potential of Large Language Models (LLMs) in generating Register Transfer Level (RTL) code, with notable advancements showcased by commercial models such as GPT-4 and Claude3-Opus. However, these proprietary LLMs often raise concerns regarding privacy and security. While open-source LLMs offer solutions to these concerns, they typically underperform commercial models in RTL code generation tasks, primarily due to the scarcity of high-quality open-source RTL datasets. To address this challenge, we introduce OriGen , a fully open-source framework that incorporates self-reflection capabilities and a novel dataset augmentation methodology for generating high-quality, large-scale RTL code. Our approach employs a code-tocode augmentation technique to enhance the quality of open-source RTL code datasets. Furthermore, OriGen can rectify syntactic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pku-liang/origen
pytorchOfficial

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Adam · Label Smoothing · Linear Layer · Byte Pair Encoding · Layer Normalization · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Dense Connections