HOSL: Hybrid-Order Split Learning for Memory-Constrained Edge Training

Aakriti Lnu; Zhe Li; Dandan Liang; Chao Huang; Rui Li; Haibo Yang

arXiv:2601.10940·cs.LG·April 7, 2026

HOSL: Hybrid-Order Split Learning for Memory-Constrained Edge Training

Aakriti Lnu, Zhe Li, Dandan Liang, Chao Huang, Rui Li, Haibo Yang

PDF

TL;DR

HOSL introduces a hybrid split learning framework combining zeroth-order and first-order optimization to enable memory-efficient training of large language models on resource-constrained edge devices, maintaining high accuracy.

Contribution

This work presents the first hybrid approach that strategically integrates ZO and FO optimization in split learning, reducing memory usage while preserving convergence speed and model performance.

Findings

01

HOSL reduces client GPU memory by up to 3.7× compared to FO methods.

02

HOSL achieves accuracy within 0.20%-4.23% of FO baseline.

03

HOSL outperforms ZO baseline by up to 15.55%.

Abstract

Split learning (SL) enables collaborative training of large language models (LLMs) between resource-constrained edge devices and compute-rich servers by partitioning model computation across the network boundary. However, existing SL systems predominantly rely on first-order (FO) optimization, which requires clients to store intermediate quantities such as activations for backpropagation. This results in substantial memory overhead, largely negating benefits of model partitioning. In contrast, zeroth-order (ZO) optimization eliminates backpropagation and significantly reduces memory usage, but often suffers from slow convergence and degraded performance. In this work, we propose HOSL, a novel Hybrid-Order Split Learning framework that addresses this fundamental trade-off between memory efficiency and optimization effectiveness by strategically integrating ZO optimization on the client…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.