TL;DR
GalleryGPT leverages large multimodal models to generate detailed formal analyses of paintings, significantly advancing AI's capability to interpret complex visual art through a new dataset and fine-tuned architecture.
Contribution
The paper introduces a new dataset and a fine-tuned multimodal model, GalleryGPT, for comprehensive formal analysis of artworks, surpassing previous simple classification tasks.
Findings
GalleryGPT outperforms baseline models in formal analysis tasks.
The model demonstrates strong zero-shot generalization capabilities.
The dataset supports detailed art analysis research.
Abstract
Artwork analysis is important and fundamental skill for art appreciation, which could enrich personal aesthetic sensibility and facilitate the critical thinking ability. Understanding artworks is challenging due to its subjective nature, diverse interpretations, and complex visual elements, requiring expertise in art history, cultural background, and aesthetic theory. However, limited by the data collection and model ability, previous works for automatically analyzing artworks mainly focus on classification, retrieval, and other simple tasks, which is far from the goal of AI. To facilitate the research progress, in this paper, we step further to compose comprehensive analysis inspired by the remarkable perception and generation ability of large multimodal models. Specifically, we first propose a task of composing paragraph analysis for artworks, i.e., painting in this paper, only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFocus
