Reconfigurable Low-latency Memory System for Sparse Matricized Tensor Times Khatri-Rao Product on FPGA
Sasindu Wijeratne, Rajgopal Kannan, Viktor Prasanna

TL;DR
This paper presents a reconfigurable FPGA-based memory system optimized for sparse tensor computations, significantly reducing memory access time and improving performance for the MTTKRP kernel in tensor decompositions.
Contribution
It introduces a multi-faceted, reconfigurable memory system that enhances data locality and access efficiency for MTTKRP on FPGA, outperforming existing memory controllers.
Findings
Memory access time improved by 3.5x over commercial IPs.
Achieves 2x speedup over cache-only systems.
Attains 1.26x speedup over DMA-only systems.
Abstract
Tensor decomposition has become an essential tool in many applications in various domains, including machine learning. Sparse Matricized Tensor Times Khatri-Rao Product (MTTKRP) is one of the most computationally expensive kernels in tensor computations. Despite having significant computational parallelism, MTTKRP is a challenging kernel to optimize due to its irregular memory access characteristics. This paper focuses on a multi-faceted memory system, which explores the spatial and temporal locality of the data structures of MTTKRP. Further, users can reconfigure our design depending on the behavior of the compute units used in the FPGA accelerator. Our system efficiently accesses all the MTTKRP data structures while reducing the total memory access time, using a distributed cache and Direct Memory Access (DMA) subsystem. Moreover, our work improves the memory access time by 3.5x…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Tensor decomposition and applications · Computational Physics and Python Applications
