Cloud-native and Distributed Systems for Efficient and Scalable Large Language Models -- A Research Agenda

Minxian Xu; Jingfeng Wu; Shengye Song; Satish Narayana Srirama; Bahman Javad; Rajiv Ranjan; Devki Nandan Jha; Sa Wang; Wenhong Tian; Huanle Xu; Li Li; Zizhao Mo; Shuo Ren; Thomas Kunz; Petar Kochovski; Vlado Stankovski; Kejiang Ye; Chengzhong Xu; Rajkumar Buyya

arXiv:2604.17227·cs.DC·April 21, 2026

Cloud-native and Distributed Systems for Efficient and Scalable Large Language Models -- A Research Agenda

Minxian Xu, Jingfeng Wu, Shengye Song, Satish Narayana Srirama, Bahman Javad, Rajiv Ranjan, Devki Nandan Jha, Sa Wang, Wenhong Tian, Huanle Xu, Li Li, Zizhao Mo, Shuo Ren, Thomas Kunz, Petar Kochovski, Vlado Stankovski, Kejiang Ye, Chengzhong Xu, Rajkumar Buyya

PDF

TL;DR

This paper discusses how cloud-native and distributed systems can address the scalability and efficiency challenges of deploying large language models, outlining research directions and technological solutions.

Contribution

It provides a comprehensive research agenda for integrating cloud and distributed architectures to support scalable, efficient LLM deployment and innovation.

Findings

01

Highlights the importance of microservices and autoscaling for LLM deployment

02

Explores emerging trends like serverless inference and federated learning

03

Proposes a roadmap for future research and standardization in LLM systems

Abstract

The rapid rise of Large Language Models (LLMs) has revolutionized various artificial intelligence (AI) applications, from natural language processing to code generation. However, the computational demands of these models, particularly in training and inference, present significant challenges. Traditional systems are often unable to meet these requirements, necessitating the integration of cloud-native and distributed architectures. This paper explores the role of cloud platforms and distributed systems in supporting the scalability, efficiency, and optimization of LLMs. We discuss the complexities of LLM deployment, including data management, resource optimization, and the need for microservices, autoscaling, and hybrid cloud-edge solutions. Additionally, we examine emerging research trends, such as serverless inference, quantum computing, and federated learning, and their potential to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.