SENTINEL: Stagewise Integrity Verification for Pipeline Parallel Decentralized Training
Hadi Mohaghegh Dolatabadi, Thalaiyasingam Ajanthan, Sameera Ramasinghe, Chamin P Hewa Koneputugodage, Gil Avraham, Yan Zuo, Violetta Shevchenko, and Alexander Long

TL;DR
SENTINEL introduces a lightweight verification method for pipeline parallel decentralized training, ensuring security against Byzantine faults without duplication, and guarantees convergence even in untrusted distributed environments.
Contribution
It proposes SENTINEL, a novel verification mechanism tailored for pipeline parallelism, with theoretical guarantees and practical effectiveness for large-scale models.
Findings
Successfully trains 4B-parameter LLMs across 176 untrusted workers
Maintains model convergence and performance under Byzantine faults
Provides theoretical convergence guarantees for the proposed method
Abstract
Decentralized training introduces critical security risks when executed across untrusted, geographically distributed nodes. While existing Byzantine-tolerant literature addresses data parallel (DP) training through robust aggregation methods, pipeline parallelism (PP) presents fundamentally distinct challenges. In PP, model layers are distributed across workers where the activations and their gradients flow between stages rather than being aggregated, making traditional DP approaches inapplicable. We propose SENTINEL, a verification mechanism for PP training without computation duplication. SENTINEL employs lightweight momentum-based monitoring using exponential moving averages (EMAs) to detect corrupted inter-stage communication. Unlike existing Byzantine-tolerant approaches for DP that aggregate parameter gradients across replicas, our approach verifies sequential activation/gradient…
Peer Reviews
Decision·Submitted to ICLR 2026
1. **Significance.** The paper discusses decentralized training, a promising approach to make LLM training accessible for small research labs, academic and individual researchers that don't have access to massive centralized clusters. The authors address the problem of Byzantine tolerance, which is known to be a major roadblock to adopting decentralized training. 2. **Originality.** The paper goes beyond most prior work and addresses Byzantine tolerance in case of using pipeline parallelism, whi
1. **No results for adversarially designed attacks.** The authors only evaluate common generic attacks (L162-173, L480-481), such as sending constants, random values, or transformations of true activations/gradients. They don't evaluate adversarial attacks specifically designed to bypass the proposed method (e.g. by sending random data mimicking the tracked EMA-based metrics). It is difficult to infer bounds on their validation loss impact from the provided theoretical derivations. 2. **No resul
1. The paper is claimed as the first comprehensive study of vulnerabilities unique to decentralized training with hybrid data–pipeline parallelism, and introduce a suite of training-interruption attacks that serve as benchmarks for evaluating the security of future systems. 2. The theoretical analysis demonstrates that undetected malicious workers have a negligible impact on the convergence properties. 3. The authors integrate our method with SWARM parallelism to demonstrate its remarkable ve
1. The authors claimed that "the paper considered the first comprehensive exploration of secure and verifiable PP decentralized training by identifying". As for me, In this setting we can see that we need to train billionparameter LLMs through internet-scale communication among distributed nodes. The paper does not discuss all possible the topology of the inter-connected distributed notes. 2. Due the issue listed above, the advantage of using decentralized training with hybrid data–pipeline p
- Novel threat model: Addresses pipeline-parallel decentralized training security — an underexplored but increasingly relevant setting. - Lightweight design: Verification via EMAs and statistical tests avoids costly redundancy or gradient aggregation. - Comprehensive evaluation: Covers numerous attack types (activation, gradient, mixed, and adaptive attacks) across large-scale distributed setups.
- Trusted verifier assumption: SENTINEL depends critically on verifier nodes being honest and reliable. If a verifier node is compromised, it can both hide attacks and falsely flag benign workers, effectively collapsing the system’s security. The paper does not discuss mechanisms such as rotating verifiers, distributed verification, or cryptographic attestation to mitigate this. - Incomplete threat model: The approach targets activation/gradient corruption but ignores broader adversarial behavi
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques
