A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency

Sihyeong Park; Sungryeol Jeon; Chaelyn Lee; Seokhun Jeon; Byung-Soo Kim; Jemin Lee

arXiv:2505.01658·cs.CL·November 27, 2025

A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency

Sihyeong Park, Sungryeol Jeon, Chaelyn Lee, Seokhun Jeon, Byung-Soo Kim, Jemin Lee

PDF

Open Access 1 Repo

TL;DR

This survey comprehensively evaluates 25 inference engines for large language models, analyzing their features, optimization techniques, ecosystem maturity, and future research directions to aid in selecting suitable solutions.

Contribution

It provides a systematic comparison of open-source and commercial inference engines, highlighting their design goals, supported optimizations, and ecosystem maturity.

Findings

01

Most engines support parallelism and caching techniques.

02

Commercial engines offer better scalability and cost policies.

03

Open-source engines vary widely in ease-of-use and deployment.

Abstract

Large language models (LLMs) are widely applied in chatbots, code generators, and search engines. Workload such as chain-of-throught, complex reasoning, agent services significantly increase the inference cost by invoke the model repeatedly. Optimization methods such as parallelism, compression, and caching have been adopted to reduce costs, but the diverse service requirements make it hard to select the right method. Recently, specialized LLM inference engines have emerged as a key component for integrating the optimization methods into service-oriented infrastructures. However, a systematic study on inference engines is still lacking.This paper provides a comprehensive evaluation of 25 open-source and commercial inference engines. We examine each inference engine in terms of ease-of-use, ease-of-deployment, general-purpose support, scalability, and suitability for throughput- and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sihyeong/awesome-llm-inference-engine
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Digital Economy · Machine Learning in Materials Science · Natural Language Processing Techniques

Methodstravel james