Comparative Analysis of Large Language Model Inference Serving Systems: A Performance Study of vLLM and HuggingFace TGI

Saicharan Kolluru

arXiv:2511.17593·cs.LG·November 25, 2025

Comparative Analysis of Large Language Model Inference Serving Systems: A Performance Study of vLLM and HuggingFace TGI

Saicharan Kolluru

PDF

Open Access

TL;DR

This paper empirically compares vLLM and HuggingFace TGI for LLM inference, highlighting their performance differences in throughput, latency, and scalability across various deployment scenarios.

Contribution

It provides a comprehensive benchmarking study of vLLM and TGI, revealing their strengths and guiding system selection based on workload needs.

Findings

01

vLLM achieves up to 24x higher throughput than TGI.

02

TGI has lower tail latencies for interactive use.

03

Performance varies significantly with workload and model size.

Abstract

The deployment of Large Language Models (LLMs) in production environments requires efficient inference serving systems that balance throughput, latency, and resource utilization. This paper presents a comprehensive empirical evaluation of two prominent open-source LLM serving frameworks: vLLM and HuggingFace Text Generation Inference (TGI). We benchmark these systems across multiple dimensions including throughput performance, end-to-end latency, GPU memory utilization, and scalability characteristics using LLaMA-2 models ranging from 7B to 70B parameters. Our experiments reveal that vLLM achieves up to 24x higher throughput than TGI under high-concurrency workloads through its novel PagedAttention mechanism, while TGI demonstrates lower tail latencies for interactive single-user scenarios. We provide detailed performance profiles for different deployment scenarios and offer practical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Big Data and Digital Economy · Natural Language Processing Techniques