Convergence of Two-Layer Regression with Nonlinear Units

Yichuan Deng; Zhao Song; Shenghao Xie

arXiv:2308.08358·cs.LG·August 17, 2023·2 cites

Convergence of Two-Layer Regression with Nonlinear Units

Yichuan Deng, Zhao Song, Shenghao Xie

PDF

Open Access

TL;DR

This paper analyzes the convergence properties of a two-layer regression model with nonlinear units, specifically focusing on softmax and ReLU functions, providing theoretical guarantees and an efficient algorithm.

Contribution

It introduces a closed-form Hessian for the ReLU regression problem, proves its properties, and proposes a convergent approximate Newton algorithm.

Findings

01

Hessian of the loss function is explicitly derived.

02

Hessian is shown to be Lipschitz continuous and PSD under certain conditions.

03

The proposed greedy algorithm converges to the optimal solution.

Abstract

Large language models (LLMs), such as ChatGPT and GPT4, have shown outstanding performance in many human life task. Attention computation plays an important role in training LLMs. Softmax unit and ReLU unit are the key structure in attention computation. Inspired by them, we put forward a softmax ReLU regression problem. Generally speaking, our goal is to find an optimal solution to the regression problem involving the ReLU unit. In this work, we calculate a close form representation for the Hessian of the loss function. Under certain assumptions, we prove the Lipschitz continuous and the PSDness of the Hessian. Then, we introduce an greedy algorithm based on approximate Newton method, which converges in the sense of the distance to optimal solution. Last, We relax the Lipschitz condition and prove the convergence in the sense of loss value.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM · Machine Learning and Data Classification · Advanced Bandit Algorithms Research

MethodsSoftmax