Network Design for Wafer-Scale Systems with Wafer-on-Wafer Hybrid Bonding
Patrick Iff, Tommaso Bonato, Maciej Besta, Luca Benini, Torsten Hoefler

TL;DR
This paper explores how wafer-on-wafer hybrid bonding can optimize network topology in wafer-scale systems, significantly enhancing communication performance for large language models by improving throughput, reducing latency, and lowering energy consumption.
Contribution
It introduces four novel reticle placement strategies that leverage wafer bonding to substantially improve network performance in wafer-scale integrated systems.
Findings
Throughput increased by up to 250%
Latency reduced by up to 36%
Energy per byte decreased by up to 38%
Abstract
Transformer-based large language models are increasingly constrained by data movement as communication bandwidth drops sharply beyond the chip boundary. Wafer-scale integration using wafer-on-wafer hybrid bonding alleviates this limitation by providing ultra-high bandwidth between reticles on bonded wafers. In this paper, we investigate how the physical placement of reticles on wafers influences the achievable network topology and the resulting communication performance. Starting from a 2D mesh-like baseline, we propose four reticle placements (Aligned, Interleaved, Rotated, and Contoured) that improve throughput by up to 250%, reduce latency by up to 36%, and decrease energy per transmitted byte by up to 38%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D IC and TSV technologies · Interconnection Networks and Systems · VLSI and FPGA Design Techniques
