Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with $1/n$ Parameters
Aston Zhang, Yi Tay, Shuai Zhang, Alvin Chan, Anh Tuan Luu, Siu Cheung, Hui, Jie Fu

TL;DR
This paper introduces a method to learn hypercomplex multiplication rules from data, enabling flexible hypercomplex neural network layers with arbitrarily low parameter counts, demonstrated across NLP tasks.
Contribution
It proposes a data-driven parameterization of hypercomplex multiplications, extending beyond fixed dimensions like 4D, 8D, and 16D, to improve model flexibility and efficiency.
Findings
Achieves comparable performance with fewer parameters in NLP tasks
Learns arbitrary hypercomplex multiplication rules from data
Demonstrates flexibility in LSTM and Transformer architectures
Abstract
Recent works have demonstrated reasonable success of representation learning in hypercomplex space. Specifically, "fully-connected layers with Quaternions" (4D hypercomplex numbers), which replace real-valued matrix multiplications in fully-connected layers with Hamilton products of Quaternions, both enjoy parameter savings with only 1/4 learnable parameters and achieve comparable performance in various applications. However, one key caveat is that hypercomplex space only exists at very few predefined dimensions (4D, 8D, and 16D). This restricts the flexibility of models that leverage hypercomplex multiplications. To this end, we propose parameterizing hypercomplex multiplications, allowing models to learn multiplication rules from data regardless of whether such rules are predefined. As a result, our method not only subsumes the Hamilton product, but also learns to operate on any…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsComputational Physics and Python Applications · Tensor decomposition and applications · Topic Modeling
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Attention Is All You Need · Softmax · Byte Pair Encoding · Label Smoothing · Dropout · Residual Connection · Multi-Head Attention
