Loading paper
Piecing Together Clues: A Benchmark for Evaluating the Detective Skills of Large Language Models | Tomesphere