MaskOpt: A Large-Scale Mask Optimization Dataset to Advance AI in Integrated Circuit Manufacturing

Yuting Hu; Lei Zhuang; Hua Xiang; Jinjun Xiong; Gi-Joon Nam

arXiv:2512.20655·cs.LG·December 25, 2025

MaskOpt: A Large-Scale Mask Optimization Dataset to Advance AI in Integrated Circuit Manufacturing

Yuting Hu, Lei Zhuang, Hua Xiang, Jinjun Xiong, Gi-Joon Nam

PDF

Open Access 3 Reviews

TL;DR

MaskOpt is a comprehensive large-scale dataset designed to improve deep learning-based mask optimization in IC manufacturing by incorporating real design contexts, standard-cell hierarchy, and neighboring geometries.

Contribution

The paper introduces MaskOpt, a large-scale, real IC design dataset with context-aware features for advancing deep learning in mask optimization.

Findings

01

Deep learning models show varied performance on MaskOpt benchmarks.

02

Context size significantly impacts mask optimization accuracy.

03

Cell-aware inputs improve deep learning model effectiveness.

Abstract

As integrated circuit (IC) dimensions shrink below the lithographic wavelength, optical lithography faces growing challenges from diffraction and process variability. Model-based optical proximity correction (OPC) and inverse lithography technique (ILT) remain indispensable but computationally expensive, requiring repeated simulations that limit scalability. Although deep learning has been applied to mask optimization, existing datasets often rely on synthetic layouts, disregard standard-cell hierarchy, and neglect the surrounding contexts around the mask optimization targets, thereby constraining their applicability to practical mask optimization. To advance deep learning for cell- and context-aware mask optimization, we present MaskOpt, a large-scale benchmark dataset constructed from real IC designs at the 45 $nm$ node. MaskOpt includes 104,714 metal-layer tiles and 121,952…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 2Confidence 4

Strengths

1. The motivation is clear. Mask optimization is indeed a core bottleneck in modern lithography because it requires repeated forward lithography simulation and careful process-window optimization. 2. The dataset is not just random image crops: tiles are explicitly aligned with standard-cell placements.

Weaknesses

1. The contribution is incremental and the novelty is overstated. The core claimed contribution is “a large-scale dataset for mask optimization that captures cell hierarchy and context.” But LithoBench already provides >100k labeled via tiles plus tens of thousands of metal-layer tiles, including both target and optimized masks. Relative to that, MaskOpt differs mainly in (1) how the clips are cropped with different margins, (2) adding a per-sample "cell tag" channel", which is more like a incre

Reviewer 02Rating 2Confidence 5

Strengths

-Benchmarks include real chip designs synthesized on open source PDK. Especially, metal layers are more close to true design patterns compared to the previous LithoBench, where metal layer designs are mostly synthetic data. -Multiple ML solutions are evaluated on the newly introduced benchmark. -The amount of data is significantly larger than existing benchmarks, potentially benefits ML-based solutions.

Weaknesses

I do have several concerns and comments on the benchmark. 1. Author mentioned the clipping is based on standard cells (plus different sizes of surrounding context), I don't know the motivation to do that: a. STC patterns are usually quite limited, for each technology node, there are typically larger hundreds or 1K STCs. based on this, I really doubt the diversity and coverage of the benchmark. b. If the clips are with lots of similar patterns with different surrounding contexts, this will only p

Reviewer 03Rating 4Confidence 3

Strengths

1. Clear presentation. This paper clearly introduces the background of optical lithography and other concepts. 2. The provided dataset contains much more data compared to the previous benchmark dataset. And it is closer to a real application without using synthetic tiles. 3. This benchmark enables cell-based hierarchical OPC, which provides more information for mask optimization.

Weaknesses

1. This paper doesn’t clearly explain the advantages and weaknesses of modelbased OPC and ILT. Experimental results show some optimization preferences (e.g., methods belonging to model-based OPC tend to optimize the L2 metric, while methods from ILT show greater improvement in EPE), but these are not clearly explained. 2. Fewer baseline methods are included for comparison. The experiment only adopts four baseline methods. For OPC mask prediction, Neural-ILT and CFNO are adaptable but not include

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvancements in Photolithography Techniques · VLSI and FPGA Design Techniques · Advanced optical system design