Silent Inconsistency in Data-Parallel Full Fine-Tuning: Diagnosing Worker-Level Optimization Misalignment

Hong Li; Zhen Zhou; Honggang Zhang; Yuping Luo; Xinyue Wang; Han Gong; Zhiyuan Liu

arXiv:2602.14462·cs.LG·February 25, 2026

Silent Inconsistency in Data-Parallel Full Fine-Tuning: Diagnosing Worker-Level Optimization Misalignment

Hong Li, Zhen Zhou, Honggang Zhang, Yuping Luo, Xinyue Wang, Han Gong, Zhiyuan Liu

PDF

Open Access

TL;DR

This paper uncovers a hidden form of inconsistency in data-parallel fine-tuning of large language models, proposing lightweight metrics to diagnose worker-level misalignment that is invisible under standard monitoring.

Contribution

It introduces a novel, model-agnostic diagnostic framework with three metrics to detect silent worker-level divergence during large-scale data-parallel training.

Findings

01

Desynchronization increases loss and gradient dispersion.

02

Reduced gradient alignment correlates with desynchronization.

03

Metrics effectively reveal hidden instability modes.

Abstract

Data-parallel (DP) training with synchronous all-reduce is a dominant paradigm for full-parameter fine-tuning of large language models (LLMs). While parameter synchronization guarantees numerical equivalence of model weights after each iteration, it does not necessarily imply alignment of worker-level optimization dynamics before gradient aggregation. This paper identifies and studies this latent mismatch, termed \emph{silent inconsistency}, where cross-worker divergence in losses and gradients can remain invisible under conventional aggregated monitoring signals. We propose a lightweight, model-agnostic diagnostic framework that quantifies worker-level consistency using training signals readily available in standard pipelines. Specifically, we introduce three complementary metrics: loss dispersion, gradient-norm dispersion, and gradient-direction consistency measured by inter-worker…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Topic Modeling