GalleryGPT: Analyzing Paintings with Large Multimodal Models

Yi Bin; Wenhao Shi; Yujuan Ding; Zhiqiang Hu; Zheng Wang; Yang Yang,; See-Kiong Ng; Heng Tao Shen

arXiv:2408.00491·cs.CL·August 2, 2024

GalleryGPT: Analyzing Paintings with Large Multimodal Models

Yi Bin, Wenhao Shi, Yujuan Ding, Zhiqiang Hu, Zheng Wang, Yang Yang,, See-Kiong Ng, Heng Tao Shen

PDF

1 Repo

TL;DR

GalleryGPT leverages large multimodal models to generate detailed formal analyses of paintings, significantly advancing AI's capability to interpret complex visual art through a new dataset and fine-tuned architecture.

Contribution

The paper introduces a new dataset and a fine-tuned multimodal model, GalleryGPT, for comprehensive formal analysis of artworks, surpassing previous simple classification tasks.

Findings

01

GalleryGPT outperforms baseline models in formal analysis tasks.

02

The model demonstrates strong zero-shot generalization capabilities.

03

The dataset supports detailed art analysis research.

Abstract

Artwork analysis is important and fundamental skill for art appreciation, which could enrich personal aesthetic sensibility and facilitate the critical thinking ability. Understanding artworks is challenging due to its subjective nature, diverse interpretations, and complex visual elements, requiring expertise in art history, cultural background, and aesthetic theory. However, limited by the data collection and model ability, previous works for automatically analyzing artworks mainly focus on classification, retrieval, and other simple tasks, which is far from the goal of AI. To facilitate the research progress, in this paper, we step further to compose comprehensive analysis inspired by the remarkable perception and generation ability of large multimodal models. Specifically, we first propose a task of composing paragraph analysis for artworks, i.e., painting in this paper, only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

steven640pixel/gallerygpt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus