Chiplet Cloud: Building AI Supercomputers for Serving Large Generative   Language Models

Huwan Peng; Scott Davidson; Richard Shi; Shuaiwen Leon Song; Michael; Taylor

arXiv:2307.02666·cs.AR·May 22, 2024·5 cites

Chiplet Cloud: Building AI Supercomputers for Serving Large Generative Language Models

Huwan Peng, Scott Davidson, Richard Shi, Shuaiwen Leon Song, Michael, Taylor

PDF

Open Access

TL;DR

This paper introduces Chiplet Cloud, a scalable, cost-efficient ASIC architecture for large language model serving, achieving significant TCO improvements over traditional GPU and TPU cloud solutions.

Contribution

It proposes a novel chiplet-based architecture with a specialized memory system and a co-design methodology to optimize LLM serving performance and cost.

Findings

01

97x TCO reduction compared to rented GPU clouds

02

18x TCO reduction compared to rented TPU clouds

03

Supports 1.7x larger models with 60% sparsity

Abstract

Large language models (LLMs) such as OpenAI's ChatGPT and Google's Gemini have demonstrated unprecedented capabilities of autoregressive AI models across multiple tasks triggering disruptive technology innovations around the world. However, as models continue to grow the cost to serve these models also continues to grow threatening the democratization of LLMs. To address this issue, we propose Chiplet Cloud, a chiplet-based ASIC LLM-supercomputer architecture whose goal is to optimize the total cost of ownership (TCO) per generated token. This architecture is a highly parameterizable ASIC and server-level architecture leveraging thousands of replicated accelerator modules collaborating to scale-up the performance of LLMs at cloud-scale. To determine specific parameterizations of the Chiplet Cloud architecture, we implemented a two-phase hardware-software co-design methodology that can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFerroelectric and Negative Capacitance Devices · Topic Modeling · Natural Language Processing Techniques