How Can Reinforcement Learning Achieve Expert-level Placement?
Ruo-Tong Chen, Ke Xue, Chengrui Gao, Yunqi Shi, Tian Xu, Peng Xie, Siyuan Xu, Mingxuan Yuan, Chao Qian, Zhi-Hua Zhou

TL;DR
This paper proposes a method for reinforcement learning in chip placement that learns from expert layouts to improve reward design, enabling the achievement of expert-level results.
Contribution
It introduces a reward modeling approach based on expert trajectories, bypassing complex process formalization, and demonstrates efficient learning from limited data.
Findings
Framework learns effectively from a single design.
Model generalizes well to unseen cases.
Achieves expert-level placement quality.
Abstract
Chip placement is a critical step in physical design. While reinforcement learning (RL)-based methods have recently emerged, their training primarily focuses on wirelength optimization, and therefore often fail to achieve expert-quality layouts. We identify the reward design as the primary cause for the performance gap with experts, and instead of formalizing intricate processes, we circumvent this by directly learning from expert layouts to derive a reward model. Our approach starts from the final expert layouts to infer step-by-step expert trajectories. Using these trajectories as demonstrations or preferences, we train a model that captures the latent implicit rewards in expert results. Experiments show that our framework can efficiently learn from even a single design and generalize well to unseen cases.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
