Loading paper
SKATE, a Scalable Tournament Eval: Weaker LLMs differentiate between stronger ones using verifiable challenges | Tomesphere