Loading paper
Posterior sampling for reinforcement learning: worst-case regret bounds | Tomesphere