On-the-fly Object Detection using StyleGAN with CLIP Guidance
Yuzhe Lu, Shusen Liu, Jayaraman J. Thiagarajan, Wesam Sakla, Rushil, Anirudh

TL;DR
This paper introduces an automated method for satellite object detection that combines StyleGAN and CLIP to generate detectors without human annotations, enabling rapid and scalable analysis of satellite imagery.
Contribution
The work presents a novel approach that leverages StyleGAN and CLIP to build object detectors on-the-fly without requiring labeled data or manual intervention.
Findings
Successfully detects objects in satellite images without annotations
Demonstrates the effectiveness of combining generative models with multi-modal learning
Enables rapid deployment of object detectors in new domains
Abstract
We present a fully automated framework for building object detectors on satellite imagery without requiring any human annotation or intervention. We achieve this by leveraging the combined power of modern generative models (e.g., StyleGAN) and recent advances in multi-modal learning (e.g., CLIP). While deep generative models effectively encode the key semantics pertinent to a data distribution, this information is not immediately accessible for downstream tasks, such as object detection. In this work, we exploit CLIP's ability to associate image features with text descriptions to identify neurons in the generator network, which are subsequently used to build detectors on-the-fly.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Advanced Neural Network Applications
