Convolutions and More as Einsum: A Tensor Network Perspective with Advances for Second-Order Methods
Felix Dangel

TL;DR
This paper presents a tensor network perspective on convolutions, simplifying their analysis and enabling efficient differentiation, curvature approximation, and hardware-efficient dropout, with significant performance improvements.
Contribution
It introduces a tensor network framework for convolutions, facilitating diagrammatic reasoning, simplifying autodiff and curvature computations, and enhancing performance and memory efficiency.
Findings
Accelerates a KFAC variant by up to 4.5x
Removes memory overhead of standard implementations
Enables hardware-efficient tensor dropout
Abstract
Despite their simple intuition, convolutions are more tedious to analyze than dense layers, which complicates the transfer of theoretical and algorithmic ideas to convolutions. We simplify convolutions by viewing them as tensor networks (TNs) that allow reasoning about the underlying tensor multiplications by drawing diagrams, manipulating them to perform function transformations like differentiation, and efficiently evaluating them with einsum. To demonstrate their simplicity and expressiveness, we derive diagrams of various autodiff operations and popular curvature approximations with full hyper-parameter support, batching, channel groups, and generalization to any convolution dimension. Further, we provide convolution-specific transformations based on the connectivity pattern which allow to simplify diagrams before evaluation. Finally, we probe performance. Our TN implementation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsParallel Computing and Optimization Techniques · Computational Physics and Python Applications · Tensor decomposition and applications
MethodsDropout · Convolution
