Ensuring Fair LLM Serving Amid Diverse Applications
Redwan Ibne Seraj Khan, Kunal Jain, Haiying Shen, Ankur Mallick,, Anjaly Parayil, Anoop Kulkarni, Steve Kofsky, Pankhuri Choudhary, Ren\`ee St., Amant, Rujia Wang, Yue Cheng, Ali R. Butt, Victor R\"uhle, Chetan Bansal,, Saravan Rajmohan

TL;DR
This paper introduces FairServe, a system designed to ensure fair access to large language models in multi-tenant platforms by accounting for application-specific request characteristics and preventing abuse.
Contribution
The paper develops FairServe, a novel fairness system that incorporates application-aware throttling and weighted scheduling, addressing limitations of existing methods in multi-tenant LLM serving.
Findings
FairServe outperforms existing fairness methods in real-world tests.
Application-aware request management improves fairness across diverse applications.
System deployment is underway to benefit millions of users.
Abstract
In a multi-tenant large language model (LLM) serving platform hosting diverse applications, some users may submit an excessive number of requests, causing the service to become unavailable to other users and creating unfairness. Existing fairness approaches do not account for variations in token lengths across applications and multiple LLM calls, making them unsuitable for such platforms. To address the fairness challenge, this paper analyzes millions of requests from thousands of users on MS CoPilot, a real-world multi-tenant LLM platform hosted by Microsoft. Our analysis confirms the inadequacy of existing methods and guides the development of FairServe, a system that ensures fair LLM access across diverse applications. FairServe proposes application-characteristic aware request throttling coupled with a weighted service counter based scheduling technique to curb abusive behavior and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLaw, AI, and Intellectual Property
Methodstravel james · Attentive Walk-Aggregating Graph Neural Network
