Inter-Layer Hessian Analysis of Neural Networks with DAG Architectures

Maxim Bolshim (1); Alexander Kugaevskikh (1) ((1) ITMO University; Saint Petersburg; Russia)

arXiv:2604.11639·cs.LG·April 14, 2026

Inter-Layer Hessian Analysis of Neural Networks with DAG Architectures

Maxim Bolshim (1), Alexander Kugaevskikh (1) ((1) ITMO University, Saint Petersburg, Russia)

PDF

TL;DR

This paper introduces a formalism to decompose the Hessian of neural networks into inter-layer blocks, revealing structural curvature interactions and providing diagnostic metrics for understanding network geometry.

Contribution

It presents an analytical decomposition of the Hessian for DAG-structured networks, along with new metrics to analyze layer interactions and residual curvature effects.

Findings

01

Hessian decomposition separates convex and residual components.

02

Tensor component of input Hessian vanishes for ReLU activations.

03

Empirical validation on MLPs and ResNet-18 confirms theoretical insights.

Abstract

Modern automatic differentiation frameworks (JAX, PyTorch) return the Hessian of the loss function as a monolithic tensor, without exposing the internal structure of inter-layer interactions. This paper presents an analytical formalism that explicitly decomposes the full Hessian into blocks indexed by the DAG of an arbitrary architecture. The canonical decomposition $H = H^{GN} + H^{T}$ separates the Gauss--Newton component (convex part) from the tensor component (residual curvature responsible for saddle points). For piecewise-linear activations (ReLU), the tensor component of the input Hessian vanishes ( $H_{v, w}^{T} \equiv 0$ a.e., $H_{v, w}^{f} = H_{v, w}^{GN} ⪰ 0$ ); the full parametric Hessian contains residual terms that do not reduce to the GGN. Building on this decomposition, we introduce diagnostic metrics (inter-layer resonance~ $R$ , geometric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.