Sketch3T: Test-Time Training for Zero-Shot SBIR
Aneeshan Sain, Ayan Kumar Bhunia, Vaishnav Potlapalli, Pinaki Nath, Chowdhury, Tao Xiang, Yi-Zhe Song

TL;DR
This paper introduces Sketch3T, a test-time training approach for zero-shot sketch-based image retrieval that adapts to new categories and sketch styles using a self-supervised auxiliary task and meta-learning, outperforming existing methods.
Contribution
The paper proposes a novel test-time training paradigm with a self-supervised auxiliary task and meta-learning to adapt to new categories and sketch styles in zero-shot SBIR.
Findings
Outperforms state-of-the-art methods in zero-shot SBIR.
Effectively adapts to new sketching styles at test time.
Demonstrates robustness to distribution shifts in sketches.
Abstract
Zero-shot sketch-based image retrieval typically asks for a trained model to be applied as is to unseen categories. In this paper, we question to argue that this setup by definition is not compatible with the inherent abstract and subjective nature of sketches, i.e., the model might transfer well to new categories, but will not understand sketches existing in different test-time distribution as a result. We thus extend ZS-SBIR asking it to transfer to both categories and sketch distributions. Our key contribution is a test-time training paradigm that can adapt using just one sketch. Since there is no paired photo, we make use of a sketch raster-vector reconstruction module as a self-supervised auxiliary task. To maintain the fidelity of the trained cross-modal joint embedding during test-time update, we design a novel meta-learning based training paradigm to learn a separation between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
