A systolic update scheme to overcome memory bandwidth limitations in GPU-accelerated FDTD simulations
Jesse Lu, David Qu, Jim Qu, Ryan Fong, Geun Ho Ahn, Jelena Vuckovic

TL;DR
This paper presents a systolic update scheme for 3D FDTD simulations on GPUs, significantly improving performance by reducing memory bandwidth limitations, thus enabling faster photonic simulations for AI applications.
Contribution
The paper introduces a novel systolic update scheme for FDTD that minimizes global memory access and synchronization, enhancing GPU-based photonic simulation efficiency.
Findings
Achieves 0.15 trillion cell updates per second on Nvidia H100 GPU.
Reduces global memory access to boundary values only.
Improves simulation throughput for photonic design workflows.
Abstract
The exponential growth of artificial intelligence has fueled the development of high-bandwidth photonic interconnect fabrics as a critical component of modern AI supercomputers. As the demand for ever-increasing AI compute and connectivity continues to grow, the need for high-throughput photonic simulation engines to accelerate and even revolutionize photonic design and verification workflows will become an increasingly indispensable capability for the integrated photonics industry. Unfortunately, the mainstay and workhorse of photonic simulation algorithms, the finite-difference time-domain (FDTD) method, because it is a memory-intensive but computationally-lightweight algorithm, is fundamentally misaligned with modern computational platforms which are equipped to deal with compute intensive workloads instead. This paper introduces a systolic update scheme for the FDTD method, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
