Loading paper
LTD-Bench: Evaluating Large Language Models by Letting Them Draw | Tomesphere