Local Convergence of Approximate Newton Method for Two Layer Nonlinear Regression
Zhihang Li, Zhao Song, Zifan Wang, Junze Yin

TL;DR
This paper analyzes the local convergence of an approximate Newton method for training a two-layer nonlinear regression model with a softmax-activated first layer, providing theoretical guarantees and complexity analysis.
Contribution
It introduces a novel analysis of a two-layer regression with softmax activation, establishing local convergence guarantees for an approximate Newton method.
Findings
Loss function Hessian is positive definite and Lipschitz continuous.
Algorithm converges to an $oldsymbol{ extepsilon}$-approximate minimizer in $O( ext{log}(1/ extepsilon))$ iterations.
Each iteration requires $ ilde{O}( ext{nnz}(C) + d^ extomega)$ time.
Abstract
There have been significant advancements made by large language models (LLMs) in various aspects of our daily lives. LLMs serve as a transformative force in natural language processing, finding applications in text generation, translation, sentiment analysis, and question-answering. The accomplishments of LLMs have led to a substantial increase in research efforts in this domain. One specific two-layer regression problem has been well-studied in prior works, where the first layer is activated by a ReLU unit, and the second layer is activated by a softmax unit. While previous works provide a solid analysis of building a two-layer regression, there is still a gap in the analysis of constructing regression problems with more than two layers. In this paper, we take a crucial step toward addressing this problem: we provide an analysis of a two-layer regression problem. In contrast to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Matrix Theory and Algorithms · Machine Learning and ELM
MethodsSoftmax
