DeepCircuitX: A Comprehensive Repository-Level Dataset for RTL Code Understanding, Generation, and PPA Analysis
Zeju Li, Changran Xu, Zhengyuan Shi, Zedong Peng, Yi Liu, Yunhao Zhou,, Lingfeng Zhou, Chengyu Ma, Jianyuan Zhong, Xi Wang, Jieru Zhao, Zhufei Chu,, Xiaoyan Yang, Qiang Xu

TL;DR
DeepCircuitX is a comprehensive, multi-level RTL dataset with detailed annotations, netlists, and PPA metrics, designed to improve machine learning models for RTL understanding, generation, and PPA prediction.
Contribution
It introduces a holistic, repository-level RTL dataset with Chain of Thought annotations, enabling advanced training and evaluation of large language models for hardware design tasks.
Findings
Effective fine-tuning of LLMs on DeepCircuitX improves RTL understanding and generation.
The dataset enables accurate PPA prediction directly from RTL code.
Human evaluations confirm the dataset's high quality and utility.
Abstract
This paper introduces DeepCircuitX, a comprehensive repository-level dataset designed to advance RTL (Register Transfer Level) code understanding, generation, and power-performance-area (PPA) analysis. Unlike existing datasets that are limited to either file-level RTL code or physical layout data, DeepCircuitX provides a holistic, multilevel resource that spans repository, file, module, and block-level RTL code. This structure enables more nuanced training and evaluation of large language models (LLMs) for RTL-specific tasks. DeepCircuitX is enriched with Chain of Thought (CoT) annotations, offering detailed descriptions of functionality and structure at multiple levels. These annotations enhance its utility for a wide range of tasks, including RTL code understanding, generation, and completion. Additionally, the dataset includes synthesized netlists and PPA metrics, facilitating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
