SENTINEL: Stagewise Integrity Verification for Pipeline Parallel Decentralized Training

Hadi Mohaghegh Dolatabadi; Thalaiyasingam Ajanthan; Sameera Ramasinghe; Chamin P Hewa Koneputugodage; Gil Avraham; Yan Zuo; Violetta Shevchenko; and Alexander Long

arXiv:2603.03592·cs.DC·March 5, 2026

SENTINEL: Stagewise Integrity Verification for Pipeline Parallel Decentralized Training

Hadi Mohaghegh Dolatabadi, Thalaiyasingam Ajanthan, Sameera Ramasinghe, Chamin P Hewa Koneputugodage, Gil Avraham, Yan Zuo, Violetta Shevchenko, and Alexander Long

PDF

Open Access 3 Reviews

TL;DR

SENTINEL introduces a lightweight verification method for pipeline parallel decentralized training, ensuring security against Byzantine faults without duplication, and guarantees convergence even in untrusted distributed environments.

Contribution

It proposes SENTINEL, a novel verification mechanism tailored for pipeline parallelism, with theoretical guarantees and practical effectiveness for large-scale models.

Findings

01

Successfully trains 4B-parameter LLMs across 176 untrusted workers

02

Maintains model convergence and performance under Byzantine faults

03

Provides theoretical convergence guarantees for the proposed method

Abstract

Decentralized training introduces critical security risks when executed across untrusted, geographically distributed nodes. While existing Byzantine-tolerant literature addresses data parallel (DP) training through robust aggregation methods, pipeline parallelism (PP) presents fundamentally distinct challenges. In PP, model layers are distributed across workers where the activations and their gradients flow between stages rather than being aggregated, making traditional DP approaches inapplicable. We propose SENTINEL, a verification mechanism for PP training without computation duplication. SENTINEL employs lightweight momentum-based monitoring using exponential moving averages (EMAs) to detect corrupted inter-stage communication. Unlike existing Byzantine-tolerant approaches for DP that aggregate parameter gradients across replicas, our approach verifies sequential activation/gradient…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

1. **Significance.** The paper discusses decentralized training, a promising approach to make LLM training accessible for small research labs, academic and individual researchers that don't have access to massive centralized clusters. The authors address the problem of Byzantine tolerance, which is known to be a major roadblock to adopting decentralized training. 2. **Originality.** The paper goes beyond most prior work and addresses Byzantine tolerance in case of using pipeline parallelism, whi

Weaknesses

1. **No results for adversarially designed attacks.** The authors only evaluate common generic attacks (L162-173, L480-481), such as sending constants, random values, or transformations of true activations/gradients. They don't evaluate adversarial attacks specifically designed to bypass the proposed method (e.g. by sending random data mimicking the tracked EMA-based metrics). It is difficult to infer bounds on their validation loss impact from the provided theoretical derivations. 2. **No resul

Reviewer 02Rating 6Confidence 4

Strengths

1. The paper is claimed as the first comprehensive study of vulnerabilities unique to decentralized training with hybrid data–pipeline parallelism, and introduce a suite of training-interruption attacks that serve as benchmarks for evaluating the security of future systems. 2. The theoretical analysis demonstrates that undetected malicious workers have a negligible impact on the convergence properties. 3. The authors integrate our method with SWARM parallelism to demonstrate its remarkable ve

Weaknesses

1. The authors claimed that "the paper considered the first comprehensive exploration of secure and verifiable PP decentralized training by identifying". As for me, In this setting we can see that we need to train billionparameter LLMs through internet-scale communication among distributed nodes. The paper does not discuss all possible the topology of the inter-connected distributed notes. 2. Due the issue listed above, the advantage of using decentralized training with hybrid data–pipeline p

Reviewer 03Rating 4Confidence 3

Strengths

- Novel threat model: Addresses pipeline-parallel decentralized training security — an underexplored but increasingly relevant setting. - Lightweight design: Verification via EMAs and statistical tests avoids costly redundancy or gradient aggregation. - Comprehensive evaluation: Covers numerous attack types (activation, gradient, mixed, and adaptive attacks) across large-scale distributed setups.

Weaknesses

- Trusted verifier assumption: SENTINEL depends critically on verifier nodes being honest and reliable. If a verifier node is compromised, it can both hide attacks and falsely flag benign workers, effectively collapsing the system’s security. The paper does not discuss mechanisms such as rotating verifiers, distributed verification, or cryptographic attestation to mitigate this. - Incomplete threat model: The approach targets activation/gradient corruption but ignores broader adversarial behavi

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques