Homodyne Photonic Tensor Processor exceeds 1,000-TOPS
Lian Zhou, Kaiwen Xue, Yun-Jhu Lee, Chun-Ho Lee, Yuan Li, Kiwon Kwon, Weipeng Zhang, Songlin Zhao, Jason Moraes, Niranjan Bhatia, Ryan Hamerly, Mengjie Yu, Zaijun Chen

TL;DR
This paper presents a homodyne photonic tensor processor capable of exceeding 1,000 TOPS, leveraging integrated optical components for high-speed, energy-efficient matrix multiplication suitable for AI applications.
Contribution
It introduces a coherent homodyne integrated circuit with record-scale parallelism and throughput, enabling ultra-high-speed photonic computing for AI workloads.
Findings
Achieved over 1,000 TOPS throughput with 6-7 bit accuracy.
Integrated 256×256 homodyne units within a single chip.
Demonstrated 330 TOPS/W energy efficiency.
Abstract
High-performance computing underpins modern artificial intelligence (AI), enabling foundation models, real-time inference and perception in autonomous systems, and data-intensive scientific simulations. Recent advances in quantization techniques utilizing low-precision computation without degrading model accuracy, create new opportunities for analog photonic computing characterized by ultra-high clock rates and low energy consumption. Here we propose and demonstrate a coherent homodyne integrated circuit capable of general matrix multiplication (GEMM) with aggregate throughput that exceeds 1,000 TOPS (tera-operations per second), enabled by massive on-chip optical fanout and parallelism. By leveraging time multiplexing, the required modulator count is reduced from O() to O(N), allowing dense integration of record-scale 256 256 homodyne units (each <0.0064 ) within a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
