LLM-Inference-Bench: Inference Benchmarking of Large Language Models on   AI Accelerators

Krishna Teja Chitty-Venkata; Siddhisanket Raskar; Bharat Kale; Farah; Ferdaus; Aditya Tanikanti; Ken Raffenetti; Valerie Taylor; Murali Emani,; Venkatram Vishwanath

arXiv:2411.00136·cs.LG·November 4, 2024·2 cites

LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators

Krishna Teja Chitty-Venkata, Siddhisanket Raskar, Bharat Kale, Farah, Ferdaus, Aditya Tanikanti, Ken Raffenetti, Valerie Taylor, Murali Emani,, Venkatram Vishwanath

PDF

Open Access 1 Repo

TL;DR

This paper introduces LLM-Inference-Bench, a benchmarking suite for evaluating the inference performance of large language models across diverse hardware platforms, aiding in understanding scalability and optimizing configurations.

Contribution

It presents a comprehensive benchmarking framework and analysis of LLM inference performance on multiple hardware accelerators and models, highlighting their strengths and limitations.

Findings

01

Nvidia and AMD GPUs show different performance profiles.

02

Specialized AI accelerators like Habana and SambaNova have unique strengths.

03

Benchmarking results guide optimal hardware and model configurations.

Abstract

Large Language Models (LLMs) have propelled groundbreaking advancements across several domains and are commonly used for text generation applications. However, the computational demands of these complex models pose significant challenges, requiring efficient hardware acceleration. Benchmarking the performance of LLMs across diverse hardware platforms is crucial to understanding their scalability and throughput characteristics. We introduce LLM-Inference-Bench, a comprehensive benchmarking suite to evaluate the hardware inference performance of LLMs. We thoroughly analyze diverse hardware platforms, including GPUs from Nvidia and AMD and specialized AI accelerators, Intel Habana and SambaNova. Our evaluation includes several LLM inference frameworks and models from LLaMA, Mistral, and Qwen families with 7B and 70B parameters. Our benchmarking results reveal the strengths and limitations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

argonne-lcf/llm-inference-bench
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsLLaMA