Design of A Low-Latency and Parallelizable SVD Dataflow Architecture on FPGA
Fangqiang Du, Sixuan Chong, Zixuan Huang, Rui Qin, Fengnan Mi, Caibao Hu, Jiangang Chen

TL;DR
This paper introduces a low-latency, parallelizable FPGA architecture for real-time large-scale SVD computation, reducing memory usage and increasing speed for data stream applications.
Contribution
It presents a novel data stream-based SVD algorithm (DSB Jacobi) that significantly reduces on-chip memory and enhances computational efficiency on FPGA.
Findings
Reduces on-chip RAM consumption by 41.5%.
Improves computational efficiency by a factor of 23.
Enables real-time processing of large-scale data streams.
Abstract
Singular value decomposition (SVD) is widely used for dimensionality reduction and noise suppression, and it plays a pivotal role in numerous scientific and engineering applications. As the dimensions of the matrix grow rapidly, the computational cost increases significantly, posing a serious challenge to the efficiency of data analysis and signal processing systems, especially in time-sensitive scenarios involving large-scale datasets. Although various dedicated hardware architectures have been proposed to accelerate the computation of intensive SVD, many of these designs suffer from limited scalability and high consumption of on-chip memory resources. Moreover, they typically overlook the computational and data transfer challenges associated with SVD, making them unsuitable for real-time processing of large-scale data stream matrices in embedded systems. In this paper, we propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNumerical Methods and Algorithms · Low-power high-performance VLSI design · Parallel Computing and Optimization Techniques
