Loading paper
PRISM-Bench: A Benchmark of Puzzle-Based Visual Tasks with CoT Error Detection | Tomesphere