EARL: Entropy-Aware RL Alignment of LLMs for Reliable RTL Code Generation
Jiahe Shi, Zhengqi Gao, Ching-Yun Ko, Duane Boning

TL;DR
This paper introduces EARL, an entropy-aware reinforcement learning method that improves the reliability of LLM-generated RTL code by focusing training on high-uncertainty tokens, leading to better functional correctness.
Contribution
The paper proposes a novel entropy-guided selective update mechanism in RL for RTL code generation, enhancing training stability and model performance.
Findings
Up to 14.7% improvement in functional pass rates
Enhanced training stability and targeted policy updates
Effective focus on high-uncertainty tokens in RTL code
Abstract
Recent advances in large language models (LLMs) have demonstrated significant potential in hardware design automation, particularly in using natural language to synthesize Register-Transfer Level (RTL) code. Despite this progress, a gap remains between model capability and the demands of real-world RTL design, including syntax errors, functional hallucinations, and weak alignment to designer intent. Reinforcement Learning with Verifiable Rewards (RLVR) offers a promising approach to bridge this gap, as hardware provides executable and formally checkable signals that can be used to further align model outputs with design intent. However, in long, structured RTL code sequences, not all tokens contribute equally to functional correctness, and na\"ively spreading gradients across all tokens dilutes learning signals. A key insight from our entropy analysis in RTL generation is that only a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFormal Methods in Verification · Embedded Systems Design Techniques · Adversarial Robustness in Machine Learning
