Benchmarking the Performance of Large Language Models on the Cerebras   Wafer Scale Engine

Zuoning Zhang; Dhruv Parikh; Youning Zhang; Viktor Prasanna

arXiv:2409.00287·cs.DC·September 24, 2024

Benchmarking the Performance of Large Language Models on the Cerebras Wafer Scale Engine

Zuoning Zhang, Dhruv Parikh, Youning Zhang, Viktor Prasanna

PDF

Open Access

TL;DR

This paper evaluates the Cerebras Wafer Scale Engine's hardware capabilities in accelerating large language models' training and inference, analyzing scalability, memory bandwidth, and performance through benchmarking and roofline modeling.

Contribution

It provides the first comprehensive benchmarking of LLMs on the Cerebras WSE, demonstrating its potential to handle memory-bound and compute-intensive NLP and CV tasks.

Findings

01

Cerebras WSE significantly accelerates LLM training and inference.

02

The system effectively mitigates the memory wall with high bandwidth memory.

03

Performance scales well with model size and computational intensity.

Abstract

Transformer based Large Language Models (LLMs) have recently reached state of the art performance in Natural Language Processing (NLP) and Computer Vision (CV) domains. LLMs use the Multi-Headed Self-Attention (MHSA) mechanism to capture long-range global attention relationships among input words or image patches, drastically improving its performance over prior deep learning approaches. In this paper, we evaluate the performance of LLMs on the Cerebras Wafer Scale Engine (WSE). Cerebras WSE is a high performance computing system with 2.6 trillion transistors, 850,000 cores and 40 GB on-chip memory. Cerebras WSE's Sparse Linear Algebra Compute (SLAC) cores eliminates multiply-by-zeros operations and its 40 GB of on-chip memory is uniformly distributed among SLAC cores, enabling fast local access to model parameters. Moreover, Cerebras software configures routing between cores at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques