Efficient Training on Multiple Consumer GPUs with RoundPipe

Yibin Luo; Shiwei Gao; Huichuan Zheng; Youyou Lu; Jiwu Shu

arXiv:2604.27085·cs.DC·May 1, 2026

Efficient Training on Multiple Consumer GPUs with RoundPipe

Yibin Luo, Shiwei Gao, Huichuan Zheng, Youyou Lu, Jiwu Shu

PDF

1 Repo

TL;DR

RoundPipe is a novel pipeline scheduling method that enables efficient, near-zero-bubble training of large language models on consumer GPUs by dynamically dispatching computation stages across devices.

Contribution

It introduces RoundPipe, which breaks the weight binding constraint, allowing flexible GPU utilization and significantly improving training speed for large models.

Findings

01

Achieves 1.48--2.16× speedup over state-of-the-art baselines.

02

Enables fine-tuning of 1.7B to 32B models on consumer GPUs.

03

Supports LoRA fine-tuning of Qwen3-235B with 31K sequence length on a single server.

Abstract

Fine-tuning Large Language Models (LLMs) on consumer-grade GPUs is highly cost-effective, yet constrained by limited GPU memory and slow PCIe interconnects. Pipeline parallelism combined with CPU offloading mitigates these hardware bottlenecks by reducing communication overhead. However, existing PP schedules suffer from an inherent limitation termed the weight binding issue. Binding uneven model stages (e.g., the LM head is large) to GPUs limits the pipeline's throughput to that of the GPU with the heaviest load, leading to severe pipeline bubbles. In this paper, we propose RoundPipe, a novel pipeline schedule that breaks the weight binding constraint on consumer GPU servers. RoundPipe treats GPUs as a pool of stateless execution workers and dynamically dispatches computation stages across devices in a round-robin manner, achieving a near-zero-bubble pipeline. To ensure training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

itcarrot/RoundPipe
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.