Loading paper
When Elo Lies: Hidden Biases in Codeforces-Based Evaluation of Large Language Models | Tomesphere