Loading paper
Can LLM Reasoning Be Trusted? A Comparative Study: Using Human Benchmarking on Statistical Tasks | Tomesphere