Smaller, Smarter, Closer: The Edge of Collaborative Generative AI
Roberto Morabito, SiYoung Jang

TL;DR
This paper explores collaborative inference strategies that combine edge and cloud resources to improve the deployment of generative AI, addressing latency, cost, and privacy issues while leveraging small and large language models.
Contribution
It introduces new cooperation strategies and design principles for deploying generative AI across edge and cloud environments, supported by experimental insights.
Findings
Collaborative inference reduces latency and costs.
Edge-cloud cooperation enhances privacy and resource efficiency.
Practical deployment guidelines are provided.
Abstract
The rapid adoption of generative AI (GenAI), particularly Large Language Models (LLMs), has exposed critical limitations of cloud-centric deployments, including latency, cost, and privacy concerns. Meanwhile, Small Language Models (SLMs) are emerging as viable alternatives for resource-constrained edge environments, though they often lack the capabilities of their larger counterparts. This article explores the potential of collaborative inference systems that leverage both edge and cloud resources to address these challenges. By presenting distinct cooperation strategies alongside practical design principles and experimental insights, we offer actionable guidance for deploying GenAI across the computing continuum.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Machine Learning in Materials Science
