HybridFlow: Resource-Adaptive Subtask Routing for Efficient Edge-Cloud LLM Inference

Jiangwen Dong; Jiayu Li; Tianhang Zheng; Wanyu Lin

arXiv:2512.22137·cs.DC·January 30, 2026

HybridFlow: Resource-Adaptive Subtask Routing for Efficient Edge-Cloud LLM Inference

Jiangwen Dong, Jiayu Li, Tianhang Zheng, Wanyu Lin

PDF

Open Access

TL;DR

HybridFlow is a resource-adaptive framework for edge-cloud LLM inference that dynamically routes subtasks based on dependencies and utility models, improving efficiency and latency.

Contribution

It introduces a dependency-aware DAG and a learned benefit-cost utility model for dynamic, parallel subtask execution and routing in edge-cloud LLM inference.

Findings

01

Reduces latency and cloud API usage in multiple benchmarks.

02

Maintains competitive reasoning accuracy with structured baselines.

03

Improves cost-accuracy trade-off in edge-cloud inference.

Abstract

Edge-cloud collaborative inference is becoming a practical necessity for LLM-powered edge devices: on-device models often cannot afford the required reasoning capability, while cloud-only inference could be prohibitively costly and slow under strict latency and token/API budgets. However, existing edge-cloud collaboration methods often route per query or fixed steps simply based-on the estimated difficulty. Such coarse and static heuristics overlook subtask dependencies, missing opportunities for parallel execution and budget-adaptive routing. To this end, we propose \textbf{HybridFlow}, a resource-adaptive edge-cloud inference framework that (i) builds a dependency-aware DAG for each query and executes newly unlocked subtasks in parallel, reducing end-to-end latency; (ii) routes each subtask online to the edge or cloud via a learned benefit--cost utility model that dynamically trades…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIoT and Edge/Fog Computing · Big Data and Digital Economy · Advanced Neural Network Applications