Unified Sparse-Matrix Representations for Diverse Neural Architectures

Yuzhou Zhu

arXiv:2506.01966·cs.LG·July 24, 2025

Unified Sparse-Matrix Representations for Diverse Neural Architectures

Yuzhou Zhu

PDF

Open Access

TL;DR

This paper introduces a unified sparse matrix framework that models various neural network architectures like CNNs, RNNs, and Transformers, enabling better understanding and potentially more efficient design.

Contribution

It presents a novel algebraic framework that unifies different neural architectures as sparse matrix operations, supported by theoretical proofs and empirical validation.

Findings

01

Sparse matrix formulations match or outperform native models.

02

Models converge in fewer or similar epochs.

03

Framework aligns with GPU parallelism and optimization tools.

Abstract

Deep neural networks employ specialized architectures for vision, sequential and language tasks, yet this proliferation obscures their underlying commonalities. We introduce a unified matrix-order framework that casts convolutional, recurrent and self-attention operations as sparse matrix multiplications. Convolution is realized via an upper-triangular weight matrix performing first-order transformations; recurrence emerges from a lower-triangular matrix encoding stepwise updates; attention arises naturally as a third-order tensor factorization. We prove algebraic isomorphism with standard CNN, RNN and Transformer layers under mild assumptions. Empirical evaluations on image classification (MNIST, CIFAR-10/100, Tiny ImageNet), time-series forecasting (ETTh1, Electricity Load Diagrams) and language modeling/classification (AG News, WikiText-2, Penn Treebank) confirm that sparse-matrix…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsAbsolute Position Encodings · Layer Normalization · Byte Pair Encoding · Label Smoothing · Softmax · Dropout · Dense Connections · Transformer · Attention Is All You Need · Convolution