Design of A Low-Latency and Parallelizable SVD Dataflow Architecture on FPGA

Fangqiang Du; Sixuan Chong; Zixuan Huang; Rui Qin; Fengnan Mi; Caibao Hu; Jiangang Chen

arXiv:2511.12461·cs.DC·November 26, 2025

Design of A Low-Latency and Parallelizable SVD Dataflow Architecture on FPGA

Fangqiang Du, Sixuan Chong, Zixuan Huang, Rui Qin, Fengnan Mi, Caibao Hu, Jiangang Chen

PDF

Open Access

TL;DR

This paper introduces a low-latency, parallelizable FPGA architecture for real-time large-scale SVD computation, reducing memory usage and increasing speed for data stream applications.

Contribution

It presents a novel data stream-based SVD algorithm (DSB Jacobi) that significantly reduces on-chip memory and enhances computational efficiency on FPGA.

Findings

01

Reduces on-chip RAM consumption by 41.5%.

02

Improves computational efficiency by a factor of 23.

03

Enables real-time processing of large-scale data streams.

Abstract

Singular value decomposition (SVD) is widely used for dimensionality reduction and noise suppression, and it plays a pivotal role in numerous scientific and engineering applications. As the dimensions of the matrix grow rapidly, the computational cost increases significantly, posing a serious challenge to the efficiency of data analysis and signal processing systems, especially in time-sensitive scenarios involving large-scale datasets. Although various dedicated hardware architectures have been proposed to accelerate the computation of intensive SVD, many of these designs suffer from limited scalability and high consumption of on-chip memory resources. Moreover, they typically overlook the computational and data transfer challenges associated with SVD, making them unsuitable for real-time processing of large-scale data stream matrices in embedded systems. In this paper, we propose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNumerical Methods and Algorithms · Low-power high-performance VLSI design · Parallel Computing and Optimization Techniques