An FPGA-Based On-Device Reinforcement Learning Approach using Online   Sequential Learning

Hirohisa Watanabe; Mineto Tsukada; Hiroki Matsutani

arXiv:2005.04646·cs.LG·March 14, 2023·5 cites

An FPGA-Based On-Device Reinforcement Learning Approach using Online Sequential Learning

Hirohisa Watanabe, Mineto Tsukada, Hiroki Matsutani

PDF

Open Access

TL;DR

This paper presents a lightweight FPGA-based reinforcement learning method using OS-ELM that is more resource-efficient and faster than traditional DQN approaches, suitable for low-cost edge devices.

Contribution

It introduces a novel on-device reinforcement learning approach leveraging OS-ELM and regularization techniques, optimized for FPGA implementation on low-cost hardware.

Findings

01

Achieves 29.77x faster training on CartPole-v0 compared to DQN.

02

Demonstrates stable learning with spectral normalization and L2 regularization.

03

Successfully implements reinforcement learning on PYNQ-Z1 FPGA platform.

Abstract

DQN (Deep Q-Network) is a method to perform Q-learning for reinforcement learning using deep neural networks. DQNs require a large buffer and batch processing for an experience replay and rely on a backpropagation based iterative optimization, making them difficult to be implemented on resource-limited edge devices. In this paper, we propose a lightweight on-device reinforcement learning approach for low-cost FPGA devices. It exploits a recently proposed neural-network based on-device learning approach that does not rely on the backpropagation method but uses OS-ELM (Online Sequential Extreme Learning Machine) based training algorithm. In addition, we propose a combination of L2 regularization and spectral normalization for the on-device reinforcement learning so that output values of the neural network can be fit into a certain range and the reinforcement learning becomes stable. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM · Advanced Memory and Neural Computing · Adaptive Dynamic Programming Control

MethodsExperience Replay · Weight Decay · Q-Learning · Spectral Normalization