Representation-Driven Reinforcement Learning

Ofir Nabati; Guy Tennenholtz; Shie Mannor

arXiv:2305.19922·cs.LG·January 23, 2026·1 cites

Representation-Driven Reinforcement Learning

Ofir Nabati, Guy Tennenholtz, Shie Mannor

PDF

Open Access 1 Video

TL;DR

This paper introduces a representation-driven framework for reinforcement learning that leverages policy representations in a linear feature space to improve exploration and exploitation, demonstrating significant performance gains.

Contribution

It proposes a novel approach that embeds policy networks into a linear feature space, reframing exploration-exploitation as a representation problem, applicable to various RL methods.

Findings

01

Enhanced performance over traditional methods

02

Effective application to evolutionary and policy gradient approaches

03

Highlights importance of policy representation in RL

Abstract

We present a representation-driven framework for reinforcement learning. By representing policies as estimates of their expected values, we leverage techniques from contextual bandits to guide exploration and exploitation. Particularly, embedding a policy network into a linear feature space allows us to reframe the exploration-exploitation problem as a representation-exploitation problem, where good policy representations enable optimal exploration. We demonstrate the effectiveness of this framework through its application to evolutionary and policy gradient-based approaches, leading to significantly improved performance compared to traditional methods. Our framework provides a new perspective on reinforcement learning, highlighting the importance of policy representation in determining optimal exploration-exploitation strategies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Representation-Driven Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Advanced Bandit Algorithms Research