Context-Infused Visual Grounding for Art

Selina Khan; Nanne van Noord

arXiv:2410.12369·cs.CV·October 17, 2024

Context-Infused Visual Grounding for Art

Selina Khan, Nanne van Noord

PDF

Open Access 1 Repo

TL;DR

This paper introduces CIGAr, a visual grounding method tailored for artworks that leverages textual descriptions during training, along with a new dataset, Ukiyo-eVG, to improve object localization in art images.

Contribution

The paper proposes CIGAr, a novel visual grounding approach that incorporates artwork descriptions as context, and introduces Ukiyo-eVG, a new annotated dataset for phrase-grounding in art.

Findings

01

CIGAr outperforms existing methods on art datasets.

02

Ukiyo-eVG dataset provides high-quality annotations for art grounding.

03

Achieved state-of-the-art object detection results in artwork datasets.

Abstract

Many artwork collections contain textual attributes that provide rich and contextualised descriptions of artworks. Visual grounding offers the potential for localising subjects within these descriptions on images, however, existing approaches are trained on natural images and generalise poorly to art. In this paper, we present CIGAr (Context-Infused GroundingDINO for Art), a visual grounding approach which utilises the artwork descriptions during training as context, thereby enabling visual grounding on art. In addition, we present a new dataset, Ukiyo-eVG, with manually annotated phrase-grounding annotations, and we set a new state-of-the-art for object detection on two artwork datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

selinakhan/CIGAr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAesthetic Perception and Analysis · 3D Surveying and Cultural Heritage · Cinema and Media Studies

MethodsSparse Evolutionary Training