Unified Sparse-Matrix Representations for Diverse Neural Architectures
Yuzhou Zhu

TL;DR
This paper introduces a unified sparse matrix framework that models various neural network architectures like CNNs, RNNs, and Transformers, enabling better understanding and potentially more efficient design.
Contribution
It presents a novel algebraic framework that unifies different neural architectures as sparse matrix operations, supported by theoretical proofs and empirical validation.
Findings
Sparse matrix formulations match or outperform native models.
Models converge in fewer or similar epochs.
Framework aligns with GPU parallelism and optimization tools.
Abstract
Deep neural networks employ specialized architectures for vision, sequential and language tasks, yet this proliferation obscures their underlying commonalities. We introduce a unified matrix-order framework that casts convolutional, recurrent and self-attention operations as sparse matrix multiplications. Convolution is realized via an upper-triangular weight matrix performing first-order transformations; recurrence emerges from a lower-triangular matrix encoding stepwise updates; attention arises naturally as a third-order tensor factorization. We prove algebraic isomorphism with standard CNN, RNN and Transformer layers under mild assumptions. Empirical evaluations on image classification (MNIST, CIFAR-10/100, Tiny ImageNet), time-series forecasting (ETTh1, Electricity Load Diagrams) and language modeling/classification (AG News, WikiText-2, Penn Treebank) confirm that sparse-matrix…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsAbsolute Position Encodings · Layer Normalization · Byte Pair Encoding · Label Smoothing · Softmax · Dropout · Dense Connections · Transformer · Attention Is All You Need · Convolution
