SeeSaw: Interactive Ad-hoc Search Over Image Databases
Oscar Moll, Manuel Favela, Samuel Madden, Vijay Gadepally, Michael, Cafarella

TL;DR
SeeSaw is an interactive system that enhances ad-hoc image search by integrating advanced embeddings like CLIP with user feedback, significantly improving search accuracy especially on difficult queries.
Contribution
The paper introduces SeeSaw, a novel system that effectively combines visual-semantic embeddings with user feedback to improve ad-hoc image search results.
Findings
SeeSaw improves Average Precision by 0.08 on a broad benchmark.
It increases AP by 0.27 on difficult queries where CLIP alone struggles.
SeeSaw outperforms both CLIP-only and active-learning baselines across multiple datasets.
Abstract
As image datasets become ubiquitous, the problem of ad-hoc searches over image data is increasingly important. Many high-level data tasks in machine learning, such as constructing datasets for training and testing object detectors, imply finding ad-hoc objects or scenes within large image datasets as a key sub-problem. New foundational visual-semantic embeddings trained on massive web datasets such as Contrastive Language-Image Pre-Training (CLIP) can help users start searches on their own data, but we find there is a long tail of queries where these models fall short in practice. SeeSaw is a system for interactive ad-hoc searches on image datasets that integrates state-of-the-art embeddings like CLIP with user feedback in the form of box annotations to help users quickly locate images of interest in their data even in the long tail of harder queries. One key challenge for SeeSaw is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
