Neural Learning of Fast Matrix Multiplication Algorithms: A StrassenNet Approach
Paolo Andreini, Alessandra Bernardi, Monica Bianchini, Barbara Toniella Corradini, Sara Marziali, Giacomo Nunziati, Franco Scarselli

TL;DR
This paper introduces StrassenNet, a neural architecture that learns fast matrix multiplication algorithms, successfully recovering known algorithms for 2x2 and providing insights into the minimal rank for 3x3 multiplication.
Contribution
The paper presents a neural network approach to discover and analyze low-rank tensor decompositions for fast matrix multiplication, including recovering Strassen's algorithm and exploring minimal ranks.
Findings
Successfully reproduces Strassen's algorithm for 2x2 multiplication.
Identifies a numerical threshold at rank 23 for 3x3 multiplication.
Preliminary results on border-rank decompositions align with known bounds.
Abstract
Fast matrix multiplication can be described as searching for low-rank decompositions of the matrix--multiplication tensor. We design a neural architecture, \textsc{StrassenNet}, which reproduces the Strassen algorithm for multiplication. Across many independent runs the network always converges to a rank- tensor, thus numerically recovering Strassen's optimal algorithm. We then train the same architecture on multiplication with rank . Our experiments reveal a clear numerical threshold: models with attain significantly lower validation error than those with , suggesting that could actually be the smallest effective rank of the matrix multiplication tensor . We also sketch an extension of the method to border-rank decompositions via an --parametrisation and report preliminary results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTensor decomposition and applications · Stochastic Gradient Optimization Techniques · Model Reduction and Neural Networks
