TL;DR
HadamRNN introduces a novel method using Hadamard matrices to train orthogonal RNNs with binary and sparse ternary weights, enabling efficient edge deployment while maintaining high performance on various benchmarks.
Contribution
This work is the first to successfully binarize and ternarize vanilla RNN weights using Hadamard matrices, creating efficient orthogonal RNNs with competitive accuracy.
Findings
Achieves performance comparable to full-precision models on multiple benchmarks.
First binary RNN capable of handling the copy task over 1000 timesteps.
Demonstrates effective training of binary and sparse ternary orthogonal RNNs.
Abstract
Binary and sparse ternary weights in neural networks enable faster computations and lighter representations, facilitating their use on edge devices with limited computational power. Meanwhile, vanilla RNNs are highly sensitive to changes in their recurrent weights, making the binarization and ternarization of these weights inherently challenging. To date, no method has successfully achieved binarization or ternarization of vanilla RNN weights. We present a new approach leveraging the properties of Hadamard matrices to parameterize a subset of binary and sparse ternary orthogonal matrices. This method enables the training of orthogonal RNNs (ORNNs) with binary and sparse ternary recurrent weights, effectively creating a specific class of binary and sparse ternary vanilla RNNs. The resulting ORNNs, called HadamRNN and Block-HadamRNN, are evaluated on benchmarks such as the copy task,…
Peer Reviews
Decision·ICLR 2025 Poster
The paper proposes a novel approach to parameterize weights of Orthogonal RNN models using Hadamard Matrix Theory. As mentioned in the paper the binary and sparse ternary parameterization are investigated and analyzed on lightweight neural networks for time series, the results reported in Table 2 highlight the potential of the proposed approach by comparing to ORNN, LSTM and FastGRNN recurrent models.
The binarization and ternarization algorithms of Orthogonal Vanilla RNN are explained in sections 3.3 and 3.4 in a high-level way. Maybe providing a base example by giving the dimensions of the matrices to be binarized or ternarized would make the algorithms more transparent and easy to understand. The results reported in Table 2 and Table 3 are tough to read. Maybe converting some of the results into plots would make it more straightforward for the reader to assess the paper's contributions.
The paper is well-organized and has a high level of clarity in its writing and the presentation of its proposal. Each section flows logically, enhancing the reader's comprehension of the paper's purpose and methodology. Notably, sections 3 (+ the Appendix) stand out for its depth and precision, guiding the reader through the formality of the proposed approach and leaving little room for ambiguity regarding the study's methods and objectives. The proposal is interesting. Although it is presente
My major concern with this paper is regarding the experimental section. Specifically, in lines 123 - 125, the authors claim: *"Even with binarization, transformer models [...], making them unsuitable for tasks involving long-term dependencies on edge devices."* I find this assertion to be overly restrictive, as recent literature provides a substantial body of work on optimizing Transformers for edge devices and handling long-term dependencies, which is not discussed here. For example, numerous s
1) Novel Approach: The introduction of Hadamard matrices for parameterization is a novel and effective way to address the instability issues associated with binarizing oRNNs. 2) Theoretical Soundness: The paper provides a solid theoretical foundation for the proposed method and demonstrates its consistency. 3) Performance: The empirical results on benchmark datasets (copy task, MNIST, IMDB) are promising, showing comparable performance to full-precision models. Notably, HadamRNN is the first b
At a high-level, while I appreciate the work towards making oRNNs work better and efficient, they themselves have quite a few issues including expressivity. 1) Limited Real-World Applicability: While the benchmarks used are common in the literature, they often do not translate well to real-world scenarios. oRNNs, in general, have not seen widespread adoption in practical applications. 2) Training Time: The authors mention FastGRNN (Kusupati et al., 2018), which suggests that training oRNN mod
Code & Models
Videos
