TL;DR
Rule2DRC introduces a comprehensive benchmark for evaluating LLM agents in translating natural language design rules into executable DRC scripts, emphasizing execution correctness over code similarity.
Contribution
It provides a large-scale benchmark with execution-based scoring and a test generation method that improves script selection accuracy.
Findings
Benchmark includes 1,000 rule-to-script tasks and nearly 14,000 layouts.
Evaluation pipeline measures functional correctness via execution outcomes.
SplitTester significantly enhances script selection performance using execution feedback.
Abstract
Manufacturable chip layouts must satisfy thousands of geometry-based design rules, and design rule checking (DRC) enforces them by running executable DRC scripts on layouts. Translating natural language rules into correct DRC scripts is labor-intensive and requires specialized expertise, motivating LLM agents for DRC script synthesis and debugging. However, existing benchmarks have small evaluation sets and often evaluate scripts by code similarity rather than execution correctness, and prior machine learning-based methods either ignore execution feedback or require labeled test layouts as agent's input. To this end, we introduce Rule2DRC, a large-scale benchmark for DRC script coding agents with 1,000 rule-to-script tasks and 13,921 evaluation chip layouts for execution-based scoring. Rule2DRC provides an evaluation pipeline that measures functional correctness via DRC execution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
