SNLP: Layer-Parallel Inference via Structured Newton Corrections

Ligong Han; Kai Xu; Hao Wang; Akash Srivastava

arXiv:2605.17842·cs.LG·May 19, 2026

SNLP: Layer-Parallel Inference via Structured Newton Corrections

Ligong Han, Kai Xu, Hao Wang, Akash Srivastava

PDF

1 Repo

TL;DR

SNLP introduces a layer-parallel inference framework for Transformers that reduces latency and improves perplexity by using structured Newton corrections and regularization, enabling practical speedups.

Contribution

The paper proposes Structured Newton Layer Parallelism (SNLP), a novel method replacing exact Jacobians with surrogate dynamics to enable layer-parallel inference in Transformers.

Findings

01

SNLP improves layer-parallel compatibility and reduces perplexity by up to 23.4%.

02

On a 0.5B Nanochat model, SNLP achieves 2.3x speedup during inference.

03

SNLP regularization enhances the accuracy of structured Newton iterations, benefiting both training and inference.

Abstract

Autoregressive language models execute Transformer layers sequentially, creating a latency bottleneck that is not removed by conventional tensor or pipeline parallelism. We study whether this layerwise dependency can be relaxed by treating the hidden-state trace across layers as the solution of a nonlinear residual equation and solving it with parallel Newton-style updates. While this view is principled, exact Newton corrections require expensive Jacobian-vector products and naive fixed-point iterations are unstable on trained Transformers. We introduce Structured Newton Layer Parallelism (SNLP), a training and inference framework that replaces exact layer Jacobians with cheap architecture-induced surrogate dynamics. In residual Transformers, this yields Identity Newton (IDN), where the correction reduces to a prefix-sum-like update; in mHC-style architectures, HC Newton (HCN) uses the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

phymhan/nanochat-snlp
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.