An FPGA-Based On-Device Reinforcement Learning Approach using Online Sequential Learning
Hirohisa Watanabe, Mineto Tsukada, Hiroki Matsutani

TL;DR
This paper presents a lightweight FPGA-based reinforcement learning method using OS-ELM that is more resource-efficient and faster than traditional DQN approaches, suitable for low-cost edge devices.
Contribution
It introduces a novel on-device reinforcement learning approach leveraging OS-ELM and regularization techniques, optimized for FPGA implementation on low-cost hardware.
Findings
Achieves 29.77x faster training on CartPole-v0 compared to DQN.
Demonstrates stable learning with spectral normalization and L2 regularization.
Successfully implements reinforcement learning on PYNQ-Z1 FPGA platform.
Abstract
DQN (Deep Q-Network) is a method to perform Q-learning for reinforcement learning using deep neural networks. DQNs require a large buffer and batch processing for an experience replay and rely on a backpropagation based iterative optimization, making them difficult to be implemented on resource-limited edge devices. In this paper, we propose a lightweight on-device reinforcement learning approach for low-cost FPGA devices. It exploits a recently proposed neural-network based on-device learning approach that does not rely on the backpropagation method but uses OS-ELM (Online Sequential Extreme Learning Machine) based training algorithm. In addition, we propose a combination of L2 regularization and spectral normalization for the on-device reinforcement learning so that output values of the neural network can be fit into a certain range and the reinforcement learning becomes stable. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM · Advanced Memory and Neural Computing · Adaptive Dynamic Programming Control
MethodsExperience Replay · Weight Decay · Q-Learning · Spectral Normalization
