TL;DR
FuSeConv introduces a fully separable convolution that is systolic-friendly, enabling 3x-7x faster inference on systolic arrays for MobileNet without sacrificing accuracy, thus bridging efficient neural networks and hardware accelerators.
Contribution
The paper proposes FuSeConv, a novel convolution method that fully separates spatial and depth dimensions to optimize systolic array utilization.
Findings
Achieves 3x-7x speed-up on systolic arrays with MobileNet.
Maintains comparable accuracy on ImageNet.
Demonstrates hardware-aware neural operator search potential.
Abstract
Both efficient neural networks and hardware accelerators are being explored to speed up DNN inference on edge devices. For example, MobileNet uses depthwise separable convolution to achieve much lower latency, while systolic arrays provide much higher performance per watt. Interestingly however, the combination of these two ideas is inefficient: The computational patterns of depth-wise separable convolution are not systolic and lack data reuse to saturate the systolic array's constrained dataflow. In this paper, we propose FuSeConv (Fully-Separable Convolution) as a drop-in replacement for depth-wise separable convolution. FuSeConv generalizes the decomposition of convolutions fully to separable 1D convolutions along spatial and depth dimensions. The resultant computation is systolic and efficiently utilizes the systolic array with a slightly modified dataflow. With FuSeConv, we achieve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPointwise Convolution · Convolution · Depthwise Convolution · Depthwise Separable Convolution
