TL;DR
ArtMentor introduces a comprehensive framework utilizing multimodal large language models to improve art evaluation processes, combining a new dataset and systems for more accurate, scalable assessments.
Contribution
This paper presents a novel HCI space design and an integrated system for evaluating MLLMs in art assessment, including a new dataset and modular evaluation agents.
Findings
GPT-4o effectively assists in art evaluation dialogues.
The dataset covers 380 sessions across nine dimensions.
The system enables iterative upgrades and reliable assessments.
Abstract
Can Multimodal Large Language Models (MLLMs), with capabilities in perception, recognition, understanding, and reasoning, function as independent assistants in art evaluation dialogues? Current MLLM evaluation methods, which rely on subjective human scoring or costly interviews, lack comprehensive coverage of various scenarios. This paper proposes a process-oriented Human-Computer Interaction (HCI) space design to facilitate more accurate MLLM assessment and development. This approach aids teachers in efficient art evaluation while also recording interactions for MLLM capability assessment. We introduce ArtMentor, a comprehensive space that integrates a dataset and three systems to optimize MLLM evaluation. The dataset consists of 380 sessions conducted by five art teachers across nine critical dimensions. The modular system includes agents for entity recognition, review generation, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
