Attention Scheme Inspired Softmax Regression
Yichuan Deng, Zhihang Li, Zhao Song

TL;DR
This paper introduces a new softmax regression approach inspired by the softmax unit in large language models, providing theoretical convergence guarantees for greedy algorithms in training softmax functions.
Contribution
It proposes a softmax regression problem with a greedy algorithm and offers theoretical convergence analysis, bridging softmax applications in LLMs and convex optimization.
Findings
Proves convergence of the greedy algorithm for the proposed softmax regression.
Provides theoretical support for using greedy algorithms in softmax training.
Connects softmax functions in LLMs with convex optimization techniques.
Abstract
Large language models (LLMs) have made transformed changes for human society. One of the key computation in LLMs is the softmax unit. This operation is important in LLMs because it allows the model to generate a distribution over possible next words or phrases, given a sequence of input words. This distribution is then used to select the most likely next word or phrase, based on the probabilities assigned by the model. The softmax unit plays a crucial role in training LLMs, as it allows the model to learn from the data by adjusting the weights and biases of the neural network. In the area of convex optimization such as using central path method to solve linear programming. The softmax function has been used a crucial tool for controlling the progress and stability of potential function [Cohen, Lee and Song STOC 2019, Brand SODA 2020]. In this work, inspired the softmax unit, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning in Materials Science · Machine Learning and Data Classification
MethodsSoftmax
