LLM Inference Serving: Survey of Recent Advances and Opportunities

Baolin Li; Yankai Jiang; Vijay Gadepally; Devesh Tiwari

arXiv:2407.12391·cs.DC·July 18, 2024·1 cites

LLM Inference Serving: Survey of Recent Advances and Opportunities

Baolin Li, Yankai Jiang, Vijay Gadepally, Devesh Tiwari

PDF

Open Access

TL;DR

This survey reviews recent system-level advancements in LLM inference serving since 2023, emphasizing performance and efficiency improvements for real-world deployment without changing core decoding methods.

Contribution

It provides a comprehensive overview of recent research, highlighting key innovations and practical considerations for deploying and scaling LLMs in production environments.

Findings

01

Identifies recent system-level enhancements for LLM serving

02

Highlights performance and efficiency improvements

03

Provides practical deployment insights

Abstract

This survey offers a comprehensive overview of recent advancements in Large Language Model (LLM) serving systems, focusing on research since the year 2023. We specifically examine system-level enhancements that improve performance and efficiency without altering the core LLM decoding mechanisms. By selecting and reviewing high-quality papers from prestigious ML and system venues, we highlight key innovations and practical considerations for deploying and scaling LLMs in real-world production environments. This survey serves as a valuable resource for LLM practitioners seeking to stay abreast of the latest developments in this rapidly evolving field.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management