A Convex Relaxation Approach to Generalization Analysis for Parallel Positively Homogeneous Networks
Uday Kiran Reddy Tadipatri, Benjamin D. Haeffele, Joshua Agterberg,, Ren\'e Vidal

TL;DR
This paper introduces a convex relaxation framework for analyzing the generalization of parallel positively homogeneous neural networks, providing bounds that scale nearly linearly with network width across various models.
Contribution
It develops a unified convex relaxation approach for deriving generalization bounds applicable to a broad class of positively homogeneous neural networks, including novel models like multi-head attention.
Findings
Generalization bounds scale almost linearly with network width.
Framework applies to diverse models including matrix sensing and attention mechanisms.
Provides a global lower-bound linking non-convex ERM to convex optimization.
Abstract
We propose a general framework for deriving generalization bounds for parallel positively homogeneous neural networks--a class of neural networks whose input-output map decomposes as the sum of positively homogeneous maps. Examples of such networks include matrix factorization and sensing, single-layer multi-head attention mechanisms, tensor factorization, deep linear and ReLU networks, and more. Our general framework is based on linking the non-convex empirical risk minimization (ERM) problem to a closely related convex optimization problem over prediction functions, which provides a global, achievable lower-bound to the ERM problem. We exploit this convex lower-bound to perform generalization analysis in the convex space while controlling the discrepancy between the convex model and its non-convex counterpart. We apply our general framework to a wide variety of models ranging from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGraph theory and applications · Gene Regulatory Network Analysis · Complex Network Analysis Techniques
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Linear Layer · Softmax · Attention Is All You Need · Multi-Head Attention
