PICNIC: Silicon Photonic Interconnected Chiplets with Computational Network and In-memory Computing for LLM Inference Acceleration
Yue Jiet Chong, Yimin Wang, Zhen Wu, Xuanyao Fong

TL;DR
This paper introduces PICNIC, a silicon photonic interconnected chiplet system with in-memory computing for LLM inference, achieving significant speedup and efficiency improvements over existing GPUs.
Contribution
It presents a novel 3D-stacked chiplet architecture with photonic interconnects and in-memory computing tailored for large language model inference acceleration.
Findings
Achieves 3.95x speedup over Nvidia A100.
Attains 30x efficiency improvement over Nvidia A100.
Reaches 57x efficiency improvement over Nvidia H100 with scalability enhancements.
Abstract
This paper presents a 3D-stacked chiplets based large language model (LLM) inference accelerator, consisting of non-volatile in-memory-computing processing elements (PEs) and Inter-PE Computational Network (IPCN), interconnected via silicon photonic to effectively address the communication bottlenecks. A LLM mapping scheme was developed to optimize hardware scheduling and workload mapping. Simulation results show it achieves speedup and efficiency improvement over the Nvidia A100 before chiplet clustering and power gating scheme (CCPG). Additionally, the system achieves further scalability and efficiency improvement with the implementation of CCPG to accommodate larger models, attaining efficiency improvement over Nvidia H100 at similar throughput.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Reservoir Computing · Photonic and Optical Devices · Optical Network Technologies
