Configuration Over Selection: Hyperparameter Sensitivity Exceeds Model Differences in Open-Source LLMs for RTL Generation
Minghao Shao, Zeng Wang, Weimin Fu, Xiaolong Guo, Johann Knechtel, Ozgur Sinanoglu, Ramesh Karri, Muhammad Shafique

TL;DR
This paper demonstrates that hyperparameter configuration has a greater impact on open-source LLM performance for RTL generation than the choice of model itself, emphasizing the importance of tuning over default settings.
Contribution
It introduces a methodology for extensive hyperparameter tuning and shows that configuration effects outweigh model differences in LLM benchmarking for hardware design.
Findings
Hyperparameter tuning causes up to 25.5% pass-rate variation within the same model.
Optimal configurations do not transfer across different benchmarks.
Default hyperparameters confound model capability assessment.
Abstract
Benchmarking of open-source LLMs for hardware design focuses on which LLMs to use, while treating inference-time decoding configuration as a secondary concern. This work shows that it matters more how an LLM is configured than which model is selected. Benchmarking 26 open-source LLMs on VerilogEval and RTLLM with synthesis-in-the-loop evaluation, the study first maps the current capability landscape and then conducts an extensive 108-configuration hyperparameter sweep on three prominent models. The sweep reveals absolute pass-rate gaps of up to 25.5% between the best and worst settings for the same LLM, which is 5x larger than the average spread observed across various model families under their respective default configurations. Ranking all configurations by Spearman's across the two benchmark suites yields near-zero correlation, demonstrating that optimal configurations do not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
