CE-LSLM: Efficient Large-Small Language Model Inference and Communication via Cloud-Edge Collaboration

Pengyan Zhu; Tingting Yang

arXiv:2505.14085·cs.NI·May 21, 2025

CE-LSLM: Efficient Large-Small Language Model Inference and Communication via Cloud-Edge Collaboration

Pengyan Zhu, Tingting Yang

PDF

Open Access

TL;DR

This paper introduces a cloud-edge collaborative inference framework for large language models in 6G networks, enhancing efficiency, privacy, and responsiveness by sharing semantic states and optimizing communication and computation.

Contribution

It proposes a novel architecture integrating cloud LLMs with edge small models, featuring key-value cache reuse, cross-node scheduling, and model alignment strategies for efficient edge inference.

Findings

01

Reduces edge computational and storage overhead.

02

Improves inference latency and system stability.

03

Enhances scalability and responsiveness in 6G scenarios.

Abstract

Emerging intelligent service scenarios in 6G communication impose stringent requirements for low latency, high reliability, and privacy preservation. Generative large language models (LLMs) are gradually becoming key enablers for the integration of semantic communication and computation. However, due to the limited computational resources of edge devices and the increasing complexity of heterogeneous terminal access, existing centralized inference approaches fail to meet the dual demands of response efficiency and data privacy in edge-side inference tasks. To address these challenges, this paper proposes a novel collaborative inference architecture that integrates cloud-based LLMs with edge-deployed small language models (SLMs), enabling dynamic scheduling and sharing of semantic-level intermediate states, and establishing a unified computation-communication paradigm tailored for 6G…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Semantic Web and Ontologies