Natural Hypergradient Descent: Algorithm Design, Convergence Analysis, and Parallel Implementation

Deyi Kong; Zaiwei Chen; Shuzhong Zhang; Shancong Mou

arXiv:2602.10905·cs.LG·April 2, 2026

Natural Hypergradient Descent: Algorithm Design, Convergence Analysis, and Parallel Implementation

Deyi Kong, Zaiwei Chen, Shuzhong Zhang, Shancong Mou

PDF

TL;DR

This paper introduces Natural Hypergradient Descent (NHGD), a scalable bilevel optimization method that efficiently approximates Hessian inverse using Fisher information, with strong theoretical guarantees and empirical performance.

Contribution

NHGD leverages Fisher information for efficient Hessian inverse approximation, enabling parallel updates and reducing computational overhead in bilevel optimization.

Findings

01

NHGD achieves comparable error bounds to state-of-the-art methods.

02

NHGD significantly reduces computational time in large-scale tasks.

03

Empirical results show NHGD's scalability and effectiveness.

Abstract

In this work, we propose Natural Hypergradient Descent (NHGD), a new method for solving bilevel optimization problems. To address the computational bottleneck in hypergradient estimation--namely, the need to compute or approximate Hessian inverse--we exploit the statistical structure of the inner optimization problem and use the empirical Fisher information matrix as an asymptotically consistent surrogate for the Hessian. This design enables a parallel optimize-and-approximate framework in which the Hessian-inverse approximation is updated synchronously with the stochastic inner optimization, reusing gradient information at negligible additional cost. Our main theoretical contribution establishes high-probability error bounds and sample complexity guarantees for NHGD that match those of state-of-the-art optimize-then-approximate methods, while significantly reducing computational time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.