AP-DRL: A Synergistic Algorithm-Hardware Framework for Automatic Task Partitioning of Deep Reinforcement Learning on Versal ACAP
Enlai Li, Zhe Lin, Sharad Sinha, Wei Zhang

TL;DR
AP-DRL is an automatic, hardware-aware framework that optimizes task partitioning and quantization for deep reinforcement learning on AMD Versal ACAP, significantly accelerating training.
Contribution
It introduces a novel task partitioning and quantization framework leveraging heterogeneous Versal ACAP architecture for efficient DRL training.
Findings
Achieves up to 4.17× speedup over programmable logic baselines.
Achieves up to 3.82× speedup over AI Engine baselines.
Maintains training convergence while accelerating performance.
Abstract
Deep reinforcement learning has demonstrated remarkable success across various domains. However, the tight coupling between training and inference processes makes accelerating DRL training an essential challenge for DRL optimization. Two key issues hinder efficient DRL training: (1) the significant variation in computational intensity across different DRL algorithms and even among operations within the same algorithm complicates hardware platform selection, while (2) DRL's wide dynamic range could lead to substantial reward errors with conventional FP16+FP32 mixed-precision quantization. While existing work has primarily focused on accelerating DRL for specific computing units or optimizing inference-stage quantization, we propose AP-DRL to address the above challenges. AP-DRL is an automatic task partitioning framework that harnesses the heterogeneous architecture of AMD Versal ACAP…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
