Compressing 1D Time-Channel Separable Convolutions using Sparse Random Ternary Matrices
Gon\c{c}alo Mordido, Matthijs Van Keirsbilck, and Alexander Keller

TL;DR
This paper introduces a method to replace 1x1 convolutions in 1D separable models with fixed, sparse ternary matrices, enabling more efficient, deeper models without additional training or memory access, improving performance and reducing costs.
Contribution
The authors propose using fixed sparse ternary matrices to replace trainable convolutions in 1D models, enhancing efficiency and model capacity without training overhead.
Findings
Improved speech command recognition accuracy from 97.21% to 97.41%.
Halved the number of trained weights in Librispeech recognition with minimal performance loss.
Enabled deeper, more expressive models within the same parameter budget.
Abstract
We demonstrate that 1x1-convolutions in 1D time-channel separable convolutions may be replaced by constant, sparse random ternary matrices with weights in . Such layers do not perform any multiplications and do not require training. Moreover, the matrices may be generated on the chip during computation and therefore do not require any memory access. With the same parameter budget, we can afford deeper and more expressive models, improving the Pareto frontiers of existing models on several tasks. For command recognition on Google Speech Commands v1, we improve the state-of-the-art accuracy from to at the same network size. Alternatively, we can lower the cost of existing models. For speech recognition on Librispeech, we half the number of weights to be trained while only sacrificing about of the floating-point baseline's word error rate.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
