Loading paper
HumanEval-V: Benchmarking High-Level Visual Reasoning with Complex Diagrams in Coding Tasks | Tomesphere