Dealing with Sparse Rewards Using Graph Neural Networks
Matvey Gerasyov, Ilya Makarov

TL;DR
This paper enhances reward shaping in deep reinforcement learning for 3D navigation tasks with sparse rewards by proposing graph neural network modifications, including advanced aggregation and attention mechanisms, improving learning efficiency and interpretability.
Contribution
It introduces two novel modifications to graph convolutional network-based reward shaping methods, incorporating advanced aggregation and attention mechanisms, validated in 3D navigation tasks.
Findings
Improved convergence in sparse reward navigation tasks.
Attention mechanism highlights important environment transitions.
Enhanced interpretability of learned reward functions.
Abstract
Deep reinforcement learning in partially observable environments is a difficult task in itself, and can be further complicated by a sparse reward signal. Most tasks involving navigation in three-dimensional environments provide the agent with extremely limited information. Typically, the agent receives a visual observation input from the environment and is rewarded once at the end of the episode. A good reward function could substantially improve the convergence of reinforcement learning algorithms for such tasks. The classic approach to increase the density of the reward signal is to augment it with supplementary rewards. This technique is called the reward shaping. In this study, we propose two modifications of one of the recent reward shaping methods based on graph convolutional networks: the first involving advanced aggregation functions, and the second utilizing the attention…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Visual Attention and Saliency Detection
