Loading paper
Watson & Holmes: A Naturalistic Benchmark for Comparing Human and LLM Reasoning | Tomesphere