Hardware Software Optimizations for Fast Model Recovery on Reconfigurable Architectures

Bin Xu; Ayan Banerjee; Sandeep Gupta

arXiv:2512.06113·cs.AR·December 9, 2025

Hardware Software Optimizations for Fast Model Recovery on Reconfigurable Architectures

Bin Xu, Ayan Banerjee, Sandeep Gupta

PDF

Open Access

TL;DR

This paper introduces MERINDA, an FPGA-based framework that accelerates Model Recovery by restructuring computations into a streaming pipeline, significantly reducing cycles and enabling real-time performance in physical AI applications.

Contribution

The paper presents MERINDA, a novel FPGA-accelerated MR framework that optimizes computation structure for high throughput and real-time performance, addressing GPU inefficiencies.

Findings

01

Up to 6.3x fewer cycles compared to FPGA baseline

02

Achieves real-time performance for physical systems

03

Effectively reduces off-chip traffic and synchronization bottlenecks

Abstract

Model Recovery (MR) is a core primitive for physical AI and real-time digital twins, but GPUs often execute MR inefficiently due to iterative dependencies, kernel-launch overheads, underutilized memory bandwidth, and high data-movement latency. We present MERINDA, an FPGA-accelerated MR framework that restructures computation as a streaming dataflow pipeline. MERINDA exploits on-chip locality through BRAM tiling, fixed-point kernels, and the concurrent use of LUT fabric and carry-chain adders to expose fine-grained spatial parallelism while minimizing off-chip traffic. This hardware-aware formulation removes synchronization bottlenecks and sustains high throughput across the iterative updates in MR. On representative MR workloads, MERINDA delivers up to 6.3x fewer cycles than an FPGA-based LTC baseline, enabling real-time performance for time-critical physical systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmbedded Systems Design Techniques · Parallel Computing and Optimization Techniques · Model Reduction and Neural Networks