A Dense Tensor Accelerator with Data Exchange Mesh for DNN and Vision Workloads
Yu-Sheng Lin, Wei-Chao Chen. Chia-Lin Yang, Shao-Yi Chien

TL;DR
VectorMesh is a scalable dense tensor accelerator architecture that improves data exchange efficiency and reduces memory fetches for DNN and vision workloads, outperforming existing designs.
Contribution
The paper introduces VectorMesh, a novel architecture with a butterfly network and FIFO mesh for efficient data exchange in dense tensor processing.
Findings
Reduces global buffer and DRAM fetches by up to 22 and 5 times respectively.
Outperforms state-of-the-art architectures on CNN, GEMM, and spatial matching tasks.
Supports a wide variety of DNN and computer vision workloads efficiently.
Abstract
We propose a dense tensor accelerator called VectorMesh, a scalable, memory-efficient architecture that can support a wide variety of DNN and computer vision workloads. Its building block is a tile execution unit~(TEU), which includes dozens of processing elements~(PEs) and SRAM buffers connected through a butterfly network. A mesh of FIFOs between the TEUs facilitates data exchange between tiles and promote local data to global visibility. Our design performs better according to the roofline model for CNN, GEMM, and spatial matching algorithms compared to state-of-the-art architectures. It can reduce global buffer and DRAM fetches by 2-22 times and up to 5 times, respectively.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · CCD and CMOS Imaging Sensors
