LLaVA-Docent: Instruction Tuning with Multimodal Large Language Model to Support Art Appreciation Education
Unggi Lee, Minji Jeon, Yunseo Lee, Gyuri Byun, Yoorim Son, Jaeyoon, Shin, Hongkyu Ko, Hyeoncheol Kim

TL;DR
This paper introduces LLaVA-Docent, a multimodal large language model designed as a personalized art appreciation tutor, demonstrating its potential to enhance art education through AI-driven conversations and tailored engagement.
Contribution
The study develops LLaVA-Docent, a novel multimodal AI model for art education, including a new virtual dialogue dataset generated by GPT-4 and an iterative design process.
Findings
LLaVA-Docent effectively engages users in art appreciation conversations.
Benchmarking shows LLaVA-Docent outperforms alternative models in specific tasks.
The model demonstrates potential for personalized art education applications.
Abstract
Despite the development of various AI systems to support learning in various domains, AI assistance for art appreciation education has not been extensively explored. Art appreciation, often perceived as an unfamiliar and challenging endeavor for most students, can be more accessible with a generative AI enabled conversation partner that provides tailored questions and encourages the audience to deeply appreciate artwork. This study explores the application of multimodal large language models (MLLMs) in art appreciation education, with a focus on developing LLaVA-Docent, a model designed to serve as a personal tutor for art appreciation. Our approach involved design and development research, focusing on iterative enhancement to design and develop the application to produce a functional MLLM-enabled chatbot along with a data design framework for art appreciation education. To that end, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCultural and Artistic Studies
MethodsAttention Is All You Need · Focus · Position-Wise Feed-Forward Layer · Dense Connections · Label Smoothing · Absolute Position Encodings · Softmax · Byte Pair Encoding · Linear Layer · Multi-Head Attention
