Breaking the Computational Barrier: Provably Efficient Actor-Critic for Low-Rank MDPs

Ruiquan Huang; Donghao Li; Yingbin Liang; Jing Yang

arXiv:2605.01242·cs.LG·May 5, 2026

Breaking the Computational Barrier: Provably Efficient Actor-Critic for Low-Rank MDPs

Ruiquan Huang, Donghao Li, Yingbin Liang, Jing Yang

PDF

TL;DR

This paper introduces a provably efficient actor-critic algorithm for low-rank MDPs that leverages supervised learning for policy evaluation, improving computational and sample efficiency.

Contribution

It proposes a novel optimistic actor-critic method relying solely on the policy evaluation oracle, avoiding expensive planning or optimization.

Findings

01

Outperforms existing sample complexity guarantees for low-rank MDPs

02

Avoids computationally expensive planning oracles

03

Validated with experiments on standard Gym environments

Abstract

Reinforcement learning (RL) is a fundamental framework for sequential decision-making, in which an agent learns an optimal policy through interactions with an unknown environment. In settings with function approximation, many existing RL algorithms achieve favorable sample complexity, but often rely on computationally intractable oracles. In this paper, we use supervised learning as a computational proxy to establish a clear hierarchy of commonly adopted RL oracles under low-rank Markov Decision Processes (MDPs). This hierarchy shows that policy evaluation is the most computationally efficient oracle, provided that supervised learning can be efficiently solved. Motivated by this observation, we propose a novel optimistic actor-critic algorithm that relies solely on the policy evaluation oracle. We prove that our algorithm outperforms the existing sample complexity guarantees for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.