The Nature of Mathematical Modeling and Probabilistic Optimization Engineering in Generative AI
Fulu Li

TL;DR
This paper analyzes mathematical and probabilistic optimization techniques in generative AI, proposing novel solutions for sub-word encoding, hyperparameter tuning, positional encoding, attention mechanisms, and quantization to enhance model performance.
Contribution
It introduces new optimization methods and model enhancements for key components in Transformer-based generative AI models, advancing algorithmic and probabilistic approaches.
Findings
Optimal sub-word encoding solution similar to BPE and WordPiece.
Cross entropy method effectively optimizes hyperparameters for word2vec.
Proposed probabilistic FlashAttention improves attention efficiency.
Abstract
In this paper, we give an in-depth analysis on the mathematical problem formulations and the probabilistic optimization explorations for some of the key components in Transformer model [33] in the field of generative AI. We explore and discuss some potential further enhancement for current state of the art methods for some key underlying technologies of generative AI models from algorithmic and probabilistic optimization perspective. In particular, we present an optimal solution for sub-word encoding (SWE) based on similar initial settings as that of byte-pair encoding (BPE) algorithm in [9] with similar objectives as that of WordPiece approach in [28, 31] to maximize the likelihood of the training data. We also present cross entropy optimization method to optimize hyperparameters for word2vec model [17]. In addition, we propose a factored combination of rotary positional encoding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications · Computability, Logic, AI Algorithms · AI-based Problem Solving and Planning
MethodsLinear Layer · Dense Connections · Multi-Head Attention · Adam · Softmax · Dropout · Absolute Position Encodings · Label Smoothing · Byte Pair Encoding · Layer Normalization
