TL;DR
This paper demonstrates that reinforcement learning with partial $q^{}pi$-realizability is computationally hard, establishing NP-hardness and exponential lower bounds, thus highlighting fundamental limitations in this approximation regime.
Contribution
It introduces the partial $q^{}pi$-realizability framework and proves its computational hardness, extending complexity results to a more practical RL setting.
Findings
NP-hardness under greedy policy set
Exponential lower bound with softmax policies
Hardness results mirror those in $q^{}$-realizability
Abstract
This paper investigates the computational complexity of reinforcement learning in a novel linear function approximation regime, termed partial -realizability. In this framework, the objective is to learn an -optimal policy with respect to a predefined policy set , under the assumption that all value functions for policies in are linearly realizable. The assumptions of this framework are weaker than those in -realizability but stronger than those in -realizability, providing a practical model where function approximation naturally arises. We prove that learning an -optimal policy in this setting is computationally hard. Specifically, we establish NP-hardness under a parameterized greedy policy set (argmax) and show that - unless NP = RP - an exponential lower bound (in feature vector dimension) holds when the policy set contains…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
