Splitwise: Collaborative Edge-Cloud Inference for LLMs via Lyapunov-Assisted DRL
Abolfazl Younesi, Abbas Shabrang Maryan, Elyas Oustad, Zahra Najafabadi Samani, Mohsen Ansari, and Thomas Fahringer

TL;DR
Splitwise is an adaptive edge-cloud inference framework for large language models that minimizes latency, energy, and accuracy loss using Lyapunov-assisted deep reinforcement learning and fine-grained partitioning.
Contribution
It introduces a novel DRL-based partitioning method that decomposes transformer layers and guarantees stability and robustness under variable network conditions.
Findings
Reduces end-to-end latency by 1.4x-2.8x.
Cuts energy consumption by up to 41%.
Lowers 95th-percentile latency by 53-61%.
Abstract
Deploying large language models (LLMs) on edge devices is challenging due to their limited memory and power resources. Cloud-only inference reduces device burden but introduces high latency and cost. Static edge-cloud partitions optimize a single metric and struggle when bandwidth fluctuates. We propose Splitwise, a novel Lyapunov-assisted deep reinforcement learning (DRL) framework for fine-grained, adaptive partitioning of LLMs across edge and cloud environments. Splitwise decomposes transformer layers into attention heads and feed-forward sub-blocks, exposing more partition choices than layer-wise schemes. A hierarchical DRL policy, guided by Lyapunov optimization, jointly minimizes latency, energy consumption, and accuracy degradation while guaranteeing queue stability under stochastic workloads and variable network bandwidth. Splitwise also guarantees robustness via partition…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIoT and Edge/Fog Computing · Advanced Neural Network Applications · Big Data and Digital Economy
