Loading paper
A Benchmark for Modeling Violation-of-Expectation in Physical Reasoning Across Event Categories | Tomesphere