No Annotations for Object Detection in Art through Stable Diffusion
Patrick Ramos, Nicolas Gonthier, Selina Khan, Yuta Nakashima, Noa, Garcia

TL;DR
NADA introduces a novel pipeline leveraging diffusion models for zero-shot and weakly-supervised object detection in art images, eliminating the need for manual annotations and fine-tuning.
Contribution
It is the first approach to enable zero-shot object detection in art using diffusion models without requiring annotations or fine-tuning.
Findings
Outperforms prior weakly-supervised methods on ArtDL 2.0 and IconArt datasets.
First work to achieve zero-shot object detection in art images.
Supports both weakly-supervised and zero-shot scenarios without fine-tuning.
Abstract
Object detection in art is a valuable tool for the digital humanities, as it allows for faster identification of objects in artistic and historical images compared to humans. However, annotating such images poses significant challenges due to the need for specialized domain expertise. We present NADA (no annotations for detection in art), a pipeline that leverages diffusion models' art-related knowledge for object detection in paintings without the need for full bounding box supervision. Our method, which supports both weakly-supervised and zero-shot scenarios and does not require any fine-tuning of its pretrained components, consists of a class proposer based on large vision-language models and a class-conditioned detector based on Stable Diffusion. NADA is evaluated on two artwork datasets, ArtDL 2.0 and IconArt, outperforming prior work in weakly-supervised detection, while being the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Aesthetic Perception and Analysis · Music and Audio Processing
MethodsDiffusion
