Loading paper
LLMs Can Covertly Sandbag on Capability Evaluations Against Chain-of-Thought Monitoring | Tomesphere