TimelyFreeze: Adaptive Parameter Freezing Mechanism for Pipeline Parallelism

Seonghye Cho; Jaemin Han; Hyunjin Kim; Euisoo Jung; Jae-Gil Lee

arXiv:2602.05754·cs.DC·February 9, 2026

TimelyFreeze: Adaptive Parameter Freezing Mechanism for Pipeline Parallelism

Seonghye Cho, Jaemin Han, Hyunjin Kim, Euisoo Jung, Jae-Gil Lee

PDF

Open Access

TL;DR

TimelyFreeze is an adaptive parameter freezing method that models pipeline schedules as a DAG and uses linear programming to optimize freeze ratios, significantly improving training throughput for large models without accuracy loss.

Contribution

It introduces a novel linear programming approach to determine optimal parameter freeze ratios, balancing throughput and accuracy in pipeline parallelism.

Findings

01

Achieves up to 40% training throughput improvement on LLaMA-8B.

02

Maintains comparable accuracy while reducing training time.

03

Generalizes across various pipeline-parallel configurations.

Abstract

Pipeline parallelism enables training models that exceed single-device memory, but practical throughput remains limited by pipeline bubbles. Although parameter freezing can improve training throughput by adaptively skipping backward computation, existing methods often over-freeze parameters, resulting in unnecessary accuracy degradation. To address this issue, we propose TimelyFreeze, which models the pipeline schedule as a directed acyclic graph and solves a linear program to compute optimal freeze ratios that minimize batch execution time under accuracy constraints. Experiments show that TimelyFreeze achieves up to 40% training throughput improvement on LLaMA-8B with comparable accuracy. Overall, it enables faster large-scale model training without compromising convergence and generalizes across diverse pipeline-parallel settings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Network Packet Processing and Optimization