A study of first-passage time minimization via Q-learning in heated   gridworlds

M.A. Larchenko; P. Osinenko; G. Yaremenko; V.V. Palyulin

arXiv:2110.02129·math.OC·October 6, 2021

A study of first-passage time minimization via Q-learning in heated gridworlds

M.A. Larchenko, P. Osinenko, G. Yaremenko, V.V. Palyulin

PDF

Open Access

TL;DR

This paper investigates how reinforcement learning agents optimize first-passage times in heated gridworlds with uneven noise, revealing biases in common algorithms that impact exploration and performance.

Contribution

It provides a detailed analysis of bias effects in tabular Q-learning, SARSA, Expected SARSA, and Double Q-learning in environments with uneven noise levels.

Findings

01

High learning rates hinder exploration in high-noise regions.

02

Low learning rates increase agent presence in high-noise areas.

03

Bias effects in TD methods are significant for real-world applications.

Abstract

Optimization of first-passage times is required in applications ranging from nanobots navigation to market trading. In such settings, one often encounters unevenly distributed noise levels across the environment. We extensively study how a learning agent fares in 1- and 2- dimensional heated gridworlds with an uneven temperature distribution. The results show certain bias effects in agents trained via simple tabular Q-learning, SARSA, Expected SARSA and Double Q-learning. While high learning rate prevents exploration of regions with higher temperature, low enough rate increases the presence of agents in such regions. The discovered peculiarities and biases of temporal-difference-based reinforcement learning methods should be taken into account in real-world physical applications and agent design.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptimization and Search Problems · Distributed Control Multi-Agent Systems · Modular Robots and Swarm Intelligence

MethodsSarsa · Double Q-learning · Expected Sarsa