OriGen:Enhancing RTL Code Generation with Code-to-Code Augmentation and Self-Reflection
Fan Cui, Chenyang Yin, Kexing Zhou, Youwei Xiao, Guangyu Sun, Qiang, Xu, Qipeng Guo, Demin Song, Dahua Lin, Xingcheng Zhang, Yun (Eric) Liang

TL;DR
OriGen is an open-source framework that improves RTL code generation by using code augmentation and self-reflection to correct errors, outperforming existing open-source models and even surpassing GPT-4 Turbo in key benchmarks.
Contribution
The paper introduces OriGen, a novel open-source RTL code generation framework with self-reflection and dataset augmentation, significantly enhancing performance over prior open-source models.
Findings
OriGen outperforms other open-source models by 12.8%.
OriGen exceeds GPT-4 Turbo in pass@1 on VerilogEval-Human.
OriGen improves self-reflection capabilities by 19.9%.
Abstract
Recent studies have demonstrated the significant potential of Large Language Models (LLMs) in generating Register Transfer Level (RTL) code, with notable advancements showcased by commercial models such as GPT-4 and Claude3-Opus. However, these proprietary LLMs often raise concerns regarding privacy and security. While open-source LLMs offer solutions to these concerns, they typically underperform commercial models in RTL code generation tasks, primarily due to the scarcity of high-quality open-source RTL datasets. To address this challenge, we introduce OriGen , a fully open-source framework that incorporates self-reflection capabilities and a novel dataset augmentation methodology for generating high-quality, large-scale RTL code. Our approach employs a code-tocode augmentation technique to enhance the quality of open-source RTL code datasets. Furthermore, OriGen can rectify syntactic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Adam · Label Smoothing · Linear Layer · Byte Pair Encoding · Layer Normalization · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Dense Connections
