Iceberg: Enhancing HLS Modeling with Synthetic Data

Zijian Ding; Tung Nguyen; Weikai Li; Aditya Grover; Yizhou Sun; Jason Cong

arXiv:2507.09948·cs.LG·July 22, 2025

Iceberg: Enhancing HLS Modeling with Synthetic Data

Zijian Ding, Tung Nguyen, Weikai Li, Aditya Grover, Yizhou Sun, Jason Cong

PDF

Open Access

TL;DR

Iceberg introduces a synthetic data augmentation method for HLS prediction models, significantly improving their generalization and optimization performance across various hardware design applications.

Contribution

The paper proposes Iceberg, a novel synthetic data augmentation approach that enhances HLS modeling accuracy and adaptability using LLM-generated programs and weak labels.

Findings

01

86.4% improvement in modeling accuracy with few-shot adaptation

02

2.47x better offline DSE performance on test datasets

03

Effective generalization to six real-world applications

Abstract

Deep learning-based prediction models for High-Level Synthesis (HLS) of hardware designs often struggle to generalize. In this paper, we study how to close the generalizability gap of these models through pretraining on synthetic data and introduce Iceberg, a synthetic data augmentation approach that expands both large language model (LLM)-generated programs and weak labels of unseen design configurations. Our weak label generation method is integrated with an in-context model architecture, enabling meta-learning from actual and proximate labels. Iceberg improves the geometric mean modeling accuracy by $86.4%$ when adapt to six real-world applications with few-shot examples and achieves a $2.47 \times$ and a $1.12 \times$ better offline DSE performance when adapting to two different test datasets. Our open-sourced code is here: https://github.com/UCLA-VAST/iceberg

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems