Andes: Defining and Enhancing Quality-of-Experience in LLM-Based Text Streaming Services
Jiachen Liu, Jae-Won Chung, Zhiyu Wu, Fan Lai, Myungjin Lee, Mosharaf, Chowdhury

TL;DR
Andes is a QoE-aware LLM serving system that improves user experience in text streaming by dynamically scheduling requests to optimize perceived quality and resource efficiency.
Contribution
We define QoE for text streaming services and develop Andes, a system that enhances user experience through dynamic request prioritization based on QoE metrics.
Findings
Up to 4.7x improvement in average QoE over state-of-the-art systems.
Can save up to 61% GPU resources while maintaining high QoE.
Effective request scheduling based on QoE gain enhances streaming performance.
Abstract
Large language models (LLMs) are now at the core of conversational AI services such as real-time translation and chatbots, which provide live user interaction by incrementally streaming text to the user. However, existing LLM serving systems fail to provide good user experience because their optimization metrics are not always aligned with user experience. In this paper, we first introduce and define the notion of Quality-of-Experience (QoE) for text streaming services by considering each user's end-to-end interaction timeline. Based on this, we propose Andes, a QoE-aware LLM serving system that enhances user experience by ensuring that users receive the first token promptly and subsequent tokens at a smooth, digestible pace, even during surge periods. This is enabled by Andes's preemptive request scheduler that dynamically prioritizes requests at the token granularity based on each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPeer-to-Peer Network Technologies · Semantic Web and Ontologies · Recommender Systems and Techniques
MethodsFocus
