Batching-Aware Joint Model Onloading and Offloading for Hierarchical Multi-Task Inference

Seohyeon Cha; Kevin Chan; Gustavo de Veciana; Haris Vikalo

arXiv:2508.13380·cs.LG·August 20, 2025

Batching-Aware Joint Model Onloading and Offloading for Hierarchical Multi-Task Inference

Seohyeon Cha, Kevin Chan, Gustavo de Veciana, Haris Vikalo

PDF

TL;DR

This paper introduces J3O, a unified framework for optimizing multi-task model deployment and query routing in hierarchical edge systems, effectively balancing accuracy, memory, and communication constraints.

Contribution

It proposes a novel joint optimization algorithm for onloading and offloading multi-task models, extended to include batching, with scalable solutions for resource-constrained hierarchical inference.

Findings

01

Achieves over 97% of optimal accuracy in experiments.

02

Runs less than 15% of the time of the optimal solver.

03

Effectively handles multi-task, hierarchical inference scenarios.

Abstract

The growing demand for intelligent services on resource-constrained edge devices has spurred the development of collaborative inference systems that distribute workloads across end devices, edge servers, and the cloud. While most existing frameworks focus on single-task, single-model scenarios, many real-world applications (e.g., autonomous driving and augmented reality) require concurrent execution of diverse tasks including detection, segmentation, and depth estimation. In this work, we propose a unified framework to jointly decide which multi-task models to deploy (onload) at clients and edge servers, and how to route queries across the hierarchy (offload) to maximize overall inference accuracy under memory, compute, and communication constraints. We formulate this as a mixed-integer program and introduce J3O (Joint Optimization of Onloading and Offloading), an alternating algorithm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.