Orchestration for Domain-specific Edge-Cloud Language Models
Prasoon Patidar (1), Alex Crown (2), Kevin Hsieh (2), Yifei Xu (2), Tusher Chakraborty (2), Ranveer Chandra (2), Yuvraj Agarwal (1) ((1) Carnegie Mellon University, (2) Microsoft Research)

TL;DR
ECO-LLM is a system that optimizes edge-cloud collaboration for language models by dynamically selecting configurations at query time, significantly improving accuracy, reducing costs, and meeting latency constraints.
Contribution
It introduces a joint optimization framework and a system with query clustering and dynamic selection to enhance LLM deployment efficiency in edge-cloud environments.
Findings
Outperforms GPT-4o in accuracy (90% vs. 74%)
Reduces costs by up to 90%
Decreases latency by up to 55%
Abstract
The remarkable performance of Large Language Models (LLMs) has inspired many applications, which often necessitate edge-cloud collaboration due to connectivity, privacy, and cost considerations. Traditional methods primarily focus on selecting the best LLM model for optimizing performance, while neglecting the critical interplay between the components of the LLM serving pipeline (context retrieval, query preprocessing, etc.) or the changing latency and cost constraints. We introduce ECO-LLM (Edge-Cloud Orchestrator for LLMs), a novel system that reframes this problem as a joint optimization challenge and solves it by systematically exploring component configurations and dynamically selecting optimal strategies at the query level. ECO-LLM consists of two components: (1) the ECO-LLM Emulator, which efficiently explores the vast configuration space utilizing query clustering and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Scientific Computing and Data Management · Cloud Computing and Resource Management
