Smaller, Smarter, Closer: The Edge of Collaborative Generative AI

Roberto Morabito; SiYoung Jang

arXiv:2505.16499·cs.DC·May 30, 2025

Smaller, Smarter, Closer: The Edge of Collaborative Generative AI

Roberto Morabito, SiYoung Jang

PDF

Open Access

TL;DR

This paper explores collaborative inference strategies that combine edge and cloud resources to improve the deployment of generative AI, addressing latency, cost, and privacy issues while leveraging small and large language models.

Contribution

It introduces new cooperation strategies and design principles for deploying generative AI across edge and cloud environments, supported by experimental insights.

Findings

01

Collaborative inference reduces latency and costs.

02

Edge-cloud cooperation enhances privacy and resource efficiency.

03

Practical deployment guidelines are provided.

Abstract

The rapid adoption of generative AI (GenAI), particularly Large Language Models (LLMs), has exposed critical limitations of cloud-centric deployments, including latency, cost, and privacy concerns. Meanwhile, Small Language Models (SLMs) are emerging as viable alternatives for resource-constrained edge environments, though they often lack the capabilities of their larger counterparts. This article explores the potential of collaborative inference systems that leverage both edge and cloud resources to address these challenges. By presenting distinct cooperation strategies alongside practical design principles and experimental insights, we offer actionable guidance for deploying GenAI across the computing continuum.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Machine Learning in Materials Science