Hardware Software Optimizations for Fast Model Recovery on Reconfigurable Architectures
Bin Xu, Ayan Banerjee, Sandeep Gupta

TL;DR
This paper introduces MERINDA, an FPGA-based framework that accelerates Model Recovery by restructuring computations into a streaming pipeline, significantly reducing cycles and enabling real-time performance in physical AI applications.
Contribution
The paper presents MERINDA, a novel FPGA-accelerated MR framework that optimizes computation structure for high throughput and real-time performance, addressing GPU inefficiencies.
Findings
Up to 6.3x fewer cycles compared to FPGA baseline
Achieves real-time performance for physical systems
Effectively reduces off-chip traffic and synchronization bottlenecks
Abstract
Model Recovery (MR) is a core primitive for physical AI and real-time digital twins, but GPUs often execute MR inefficiently due to iterative dependencies, kernel-launch overheads, underutilized memory bandwidth, and high data-movement latency. We present MERINDA, an FPGA-accelerated MR framework that restructures computation as a streaming dataflow pipeline. MERINDA exploits on-chip locality through BRAM tiling, fixed-point kernels, and the concurrent use of LUT fabric and carry-chain adders to expose fine-grained spatial parallelism while minimizing off-chip traffic. This hardware-aware formulation removes synchronization bottlenecks and sustains high throughput across the iterative updates in MR. On representative MR workloads, MERINDA delivers up to 6.3x fewer cycles than an FPGA-based LTC baseline, enabling real-time performance for time-critical physical systems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmbedded Systems Design Techniques · Parallel Computing and Optimization Techniques · Model Reduction and Neural Networks
