Beyond Fully-Connected Layers with Quaternions: Parameterization of   Hypercomplex Multiplications with $1/n$ Parameters

Aston Zhang; Yi Tay; Shuai Zhang; Alvin Chan; Anh Tuan Luu; Siu Cheung; Hui; Jie Fu

arXiv:2102.08597·cs.LG·February 18, 2021·36 cites

Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with $1/n$ Parameters

Aston Zhang, Yi Tay, Shuai Zhang, Alvin Chan, Anh Tuan Luu, Siu Cheung, Hui, Jie Fu

PDF

Open Access 3 Repos 1 Video

TL;DR

This paper introduces a method to learn hypercomplex multiplication rules from data, enabling flexible hypercomplex neural network layers with arbitrarily low parameter counts, demonstrated across NLP tasks.

Contribution

It proposes a data-driven parameterization of hypercomplex multiplications, extending beyond fixed dimensions like 4D, 8D, and 16D, to improve model flexibility and efficiency.

Findings

01

Achieves comparable performance with fewer parameters in NLP tasks

02

Learns arbitrary hypercomplex multiplication rules from data

03

Demonstrates flexibility in LSTM and Transformer architectures

Abstract

Recent works have demonstrated reasonable success of representation learning in hypercomplex space. Specifically, "fully-connected layers with Quaternions" (4D hypercomplex numbers), which replace real-valued matrix multiplications in fully-connected layers with Hamilton products of Quaternions, both enjoy parameter savings with only 1/4 learnable parameters and achieve comparable performance in various applications. However, one key caveat is that hypercomplex space only exists at very few predefined dimensions (4D, 8D, and 16D). This restricts the flexibility of models that leverage hypercomplex multiplications. To this end, we propose parameterizing hypercomplex multiplications, allowing models to learn multiplication rules from data regardless of whether such rules are predefined. As a result, our method not only subsumes the Hamilton product, but also learns to operate on any…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with $1/n$ Parameters· slideslive

Taxonomy

TopicsComputational Physics and Python Applications · Tensor decomposition and applications · Topic Modeling

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Attention Is All You Need · Softmax · Byte Pair Encoding · Label Smoothing · Dropout · Residual Connection · Multi-Head Attention