Loading paper
Variance-reduced $Q$-learning is minimax optimal | Tomesphere