LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators
Krishna Teja Chitty-Venkata, Siddhisanket Raskar, Bharat Kale, Farah, Ferdaus, Aditya Tanikanti, Ken Raffenetti, Valerie Taylor, Murali Emani,, Venkatram Vishwanath

TL;DR
This paper introduces LLM-Inference-Bench, a benchmarking suite for evaluating the inference performance of large language models across diverse hardware platforms, aiding in understanding scalability and optimizing configurations.
Contribution
It presents a comprehensive benchmarking framework and analysis of LLM inference performance on multiple hardware accelerators and models, highlighting their strengths and limitations.
Findings
Nvidia and AMD GPUs show different performance profiles.
Specialized AI accelerators like Habana and SambaNova have unique strengths.
Benchmarking results guide optimal hardware and model configurations.
Abstract
Large Language Models (LLMs) have propelled groundbreaking advancements across several domains and are commonly used for text generation applications. However, the computational demands of these complex models pose significant challenges, requiring efficient hardware acceleration. Benchmarking the performance of LLMs across diverse hardware platforms is crucial to understanding their scalability and throughput characteristics. We introduce LLM-Inference-Bench, a comprehensive benchmarking suite to evaluate the hardware inference performance of LLMs. We thoroughly analyze diverse hardware platforms, including GPUs from Nvidia and AMD and specialized AI accelerators, Intel Habana and SambaNova. Our evaluation includes several LLM inference frameworks and models from LLaMA, Mistral, and Qwen families with 7B and 70B parameters. Our benchmarking results reveal the strengths and limitations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsLLaMA
