Chip-to-chip photonic connectivity in multi-accelerator servers for ML

Abhishek Vijaya Kumar; Arjun Devraj; Darius Bunandar; Rachee Singh

arXiv:2501.18169·cs.NI·January 31, 2025

Chip-to-chip photonic connectivity in multi-accelerator servers for ML

Abhishek Vijaya Kumar, Arjun Devraj, Darius Bunandar, Rachee Singh

PDF

Open Access

TL;DR

This paper introduces a rack-scale ML architecture utilizing chip-to-chip silicon photonics, achieving faster communication and training throughput in multi-accelerator servers.

Contribution

It presents a novel multi-accelerator server architecture with chip-to-chip photonic connectivity enabling efficient resource sharing and improved ML training performance.

Findings

01

74% faster collective communication

02

1.7X increase in training throughput

03

Effective multi-tenancy with resource slicing

Abstract

We present a rack-scale compute architecture for ML using multi-accelerator servers connected via chip-to-chip silicon photonic components. Our architecture achieves (1) multi-tenanted resource slicing without fragmentation, (2) 74% faster rack-scale collective communication, and (3) 1.7X speedup in end-to-end ML training throughput.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhotonic and Optical Devices · Advanced Photonic Communication Systems · Optical Network Technologies