DROGO: Default Representation Objective via Graph Optimization in Reinforcement Learning
Hon Tik Tse, Marlos C. Machado

TL;DR
This paper introduces DROGO, a neural network-based method for directly approximating the principal eigenvector of the default representation in reinforcement learning, enabling scalable and efficient reward shaping in high-dimensional environments.
Contribution
The paper proposes a novel objective for directly learning the principal eigenvector of the default representation using neural networks, improving scalability over previous methods.
Findings
Effective eigenvector approximation demonstrated in various environments.
Improved reward shaping performance using learned eigenvectors.
Scalable approach suitable for high-dimensional spaces.
Abstract
In computational reinforcement learning, the default representation (DR) and its principal eigenvector have been shown to be effective for a wide variety of applications, including reward shaping, count-based exploration, option discovery, and transfer. However, in prior investigations, the eigenvectors of the DR were computed by first approximating the DR matrix, and then performing an eigendecomposition. This procedure is computationally expensive and does not scale to high-dimensional spaces. In this paper, we derive an objective for directly approximating the principal eigenvector of the DR with a neural network. We empirically demonstrate the effectiveness of the objective in a number of environments, and apply the learned eigenvectors for reward shaping.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Graph Neural Networks · Adversarial Robustness in Machine Learning
