ChipLingo: A Systematic Training Framework for Large Language Models in EDA
Lei Li, Xingwen Yu, Jianguo Ni, Junxuan Zhu, Jieqiong Zhang, Jian Zhao, Zhi Liu

TL;DR
ChipLingo is a comprehensive training framework that adapts large language models specifically for electronic design automation, improving their domain expertise and retrieval capabilities.
Contribution
The paper introduces a systematic pipeline for domain-specific LLM training in EDA, including data curation, domain-adaptive pretraining, and instruction alignment, along with an internal benchmark.
Findings
ChipLingo-8B achieves 59.7% accuracy on EDA-Bench.
ChipLingo-32B reaches 70.02%, nearing commercial models.
QA augmentation and explicit RAG training improve domain performance and retrieval utilization.
Abstract
With the rapid advancement of semiconductor technology, Electronic Design Automation (EDA) has become an increasingly knowledge-intensive and document-driven engineering domain. Although large language models (LLMs) have shown strong general capabilities, applying them directly to EDA remains challenging due to limited domain expertise, cross-tool knowledge confusion, and degraded retrieval-augmented generation (RAG) performance after domain training. To address these issues, this paper presents ChipLingo, a systematic training pipeline for domain-adapted LLMs tailored to EDA scenarios. ChipLingo consists of three stages: domain corpus construction with multi-source data curation and QA augmentation, domain-adaptive pretraining with comparisons of different parameter training strategies, and instruction alignment with RAG scenario training under diverse retrieval conditions. We also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
