Challenges and Opportunities to Enable Large-Scale Computing via Heterogeneous Chiplets
Zhuoping Yang, Shixin Ji, Xingzhen Chen, Jinming Zhuang and, Weifeng Zhang, Dharmesh Jani, Peipei Zhou

TL;DR
This paper explores the potential and challenges of using heterogeneous chiplet architectures to improve large-scale AI computing, focusing on interconnect, standardization, packaging, security, and software issues.
Contribution
It provides a comprehensive analysis of the opportunities and technical challenges in deploying large-scale heterogeneous chiplet systems for AI workloads.
Findings
Chiplet architectures improve cost efficiency and reduce time to market.
Interconnect, standardization, and security are key challenges in chiplet adoption.
Software programming for heterogeneous chiplet systems presents significant hurdles.
Abstract
Fast-evolving artificial intelligence (AI) algorithms such as large language models have been driving the ever-increasing computing demands in today's data centers. Heterogeneous computing with domain-specific architectures (DSAs) brings many opportunities when scaling up and scaling out the computing system. In particular, heterogeneous chiplet architecture is favored to keep scaling up and scaling out the system as well as to reduce the design complexity and the cost stemming from the traditional monolithic chip design. However, how to interconnect computing resources and orchestrate heterogeneous chiplets is the key to success. In this paper, we first discuss the diversity and evolving demands of different AI workloads. We discuss how chiplet brings better cost efficiency and shorter time to market. Then we discuss the challenges in establishing chiplet interface standards,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModular Robots and Swarm Intelligence · Parallel Computing and Optimization Techniques · Ferroelectric and Negative Capacitance Devices
