Towards a Unified Analysis of Neural Networks in Nonparametric Instrumental Variable Regression: Optimization and Generalization

Zonghao Chen; Atsushi Nitanda; Arthur Gretton; Taiji Suzuki

arXiv:2511.14710·stat.ML·November 19, 2025

Towards a Unified Analysis of Neural Networks in Nonparametric Instrumental Variable Regression: Optimization and Generalization

Zonghao Chen, Atsushi Nitanda, Arthur Gretton, Taiji Suzuki

PDF

Open Access

TL;DR

This paper introduces a novel neural network approach for nonparametric instrumental variable regression, establishing global convergence and generalization bounds through a bilevel optimization framework and validating it empirically.

Contribution

It develops a first-order algorithm for neural network-based 2SLS in NPIV, addressing bilevel optimization challenges with a mean-field Langevin dynamics perspective.

Findings

01

Proves global convergence of neural networks in NPIV 2SLS.

02

Provides generalization bounds highlighting the trade-off in Lagrange multiplier choice.

03

Empirically demonstrates effectiveness on reinforcement learning benchmark.

Abstract

We establish the first global convergence result of neural networks for two stage least squares (2SLS) approach in nonparametric instrumental variable regression (NPIV). This is achieved by adopting a lifted perspective through mean-field Langevin dynamics (MFLD), unlike standard MFLD, however, our setting of 2SLS entails a \emph{bilevel} optimization problem in the space of probability measures. To address this challenge, we leverage the penalty gradient approach recently developed for bilevel optimization which formulates bilevel optimization as a Lagrangian problem. This leads to a novel fully first-order algorithm, termed \texttt{F $^{2}$ BMLD}. Apart from the convergence bound, we further provide a generalization bound, revealing an inherent trade-off in the choice of the Lagrange multiplier between optimization and statistical guarantees. Finally, we empirically validate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Model Reduction and Neural Networks · Gaussian Processes and Bayesian Inference