Wavelet GPT: Wavelet Inspired Large Language Models
Prateek Verma

TL;DR
Wavelet GPT integrates wavelet-inspired structures into large language models to enhance training efficiency and performance across multiple data modalities without increasing parameters.
Contribution
This paper introduces a wavelet-based structure into LLMs during pre-training, improving efficiency and performance without adding extra parameters.
Findings
Achieves nearly twice the training speed in text, audio, and images.
Matches the performance of larger models with the same training steps.
Extends to various data representations and benchmarks.
Abstract
Large Language Models (LLMs) have ushered in a new wave of artificial intelligence advancements impacting every scientific field and discipline. We live in a world where most of the data around us, e.g., text, audio, and music, has a multi-scale structure. This paper infuses LLMs with a traditional signal processing idea, namely wavelets, during pre-training to take advantage of the structure. Without adding \textbf{any extra parameters} to a GPT-style LLM architecture in an academic setup, we achieve the same pre-training performance almost twice as fast in text, audio, and images. This is done by imposing a structure on intermediate embeddings. When trained for the same number of training steps, we achieve significant gains in performance, which is comparable to pre-training a larger neural architecture. Further, we show this extends to the Long Range Arena benchmark and several input…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsAttention Is All You Need · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Softmax · Layer Normalization · Position-Wise Feed-Forward Layer · Dropout · Dense Connections · Residual Connection
