Conjugate Learning Theory: Uncovering the Mechanisms of Trainability and Generalization in Deep Neural Networks

Binchuan Qi

arXiv:2602.16177·stat.ML·February 20, 2026

Conjugate Learning Theory: Uncovering the Mechanisms of Trainability and Generalization in Deep Neural Networks

Binchuan Qi

PDF

Open Access

TL;DR

This paper introduces a conjugate learning framework for deep neural networks that characterizes trainability and generalization, linking theoretical insights with empirical validation to enhance understanding of deep learning mechanisms.

Contribution

It develops a novel conjugate duality-based theory for DNN learnability, providing bounds on training and generalization errors, and analyzing the effects of architecture and data.

Findings

01

Training with mini-batch SGD achieves global empirical risk optima.

02

Model architecture and batch size significantly influence optimization dynamics.

03

Theoretical bounds on generalization error depend on information loss and feature-label entropy.

Abstract

In this work, we propose a notion of practical learnability grounded in finite sample settings, and develop a conjugate learning theoretical framework based on convex conjugate duality to characterize this learnability property. Building on this foundation, we demonstrate that training deep neural networks (DNNs) with mini-batch stochastic gradient descent (SGD) achieves global optima of empirical risk by jointly controlling the extreme eigenvalues of a structure matrix and the gradient energy, and we establish a corresponding convergence theorem. We further elucidate the impact of batch size and model architecture (including depth, parameter count, sparsity, skip connections, and other characteristics) on non-convex optimization. Additionally, we derive a model-agnostic lower bound for the achievable empirical risk, theoretically demonstrating that data determines the fundamental limit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications