Wave Network: An Ultra-Small Language Model
Xin Zhang, Victor S.Sheng

TL;DR
The paper introduces the Wave Network, an ultra-small language model using complex vector token representations, achieving high accuracy in text classification with significantly reduced memory and training time.
Contribution
It presents a novel token representation and update method in a tiny language model, outperforming a single Transformer layer with BERT embeddings.
Findings
Achieves 90.91% accuracy with wave interference
Reduces memory and training time by over 77% and 85%
Approaches BERT base accuracy with only 2.4 million parameters
Abstract
We propose an innovative token representation and update method in a new ultra-small language model: the Wave network. Specifically, we use a complex vector to represent each token, encoding both global and local semantics of the input text. A complex vector consists of two components: a magnitude vector representing the global semantics of the input text, and a phase vector capturing the relationships between individual tokens and global semantics. Experiments on the AG News text classification task demonstrate that, when generating complex vectors from randomly initialized token embeddings, our single-layer Wave Network achieves 90.91% accuracy with wave interference and 91.66% with wave modulation - outperforming a single Transformer layer using BERT pre-trained embeddings by 19.23% and 19.98%, respectively, and approaching the accuracy of the pre-trained and fine-tuned BERT base…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOpinion Dynamics and Social Influence
MethodsPosition-Wise Feed-Forward Layer · Linear Layer · Byte Pair Encoding · Dropout · Absolute Position Encodings · Refunds@Expedia|||How do I get a full refund from Expedia? · Label Smoothing · Transformer · Dense Connections · Layer Normalization
