TL;DR
Hystar introduces a hypernetwork-based method for style-adaptive image retrieval, dynamically adjusting model weights to improve performance across diverse query styles while maintaining stability and efficiency.
Contribution
The paper presents Hystar, a novel lightweight framework that uses hypernetworks and singular-value perturbations for dynamic, style-aware model adaptation in image retrieval tasks.
Findings
Hystar outperforms strong baselines on multi-style retrieval benchmarks.
It achieves state-of-the-art results with parameter efficiency.
Hystar maintains stability across diverse styles.
Abstract
Query-based image retrieval (QBIR) requires retrieving relevant images given diverse and often stylistically heterogeneous queries, such as sketches, artworks, or low-resolution previews. While large-scale vision--language representation models (VLRMs) like CLIP offer strong zero-shot retrieval performance, they struggle with distribution shifts caused by unseen query styles. In this paper, we propose the Hypernetwork-driven Style-adaptive Retrieval (Hystar), a lightweight framework that dynamically adapts model weights to each query's style. Hystar employs a hypernetwork to generate singular-value perturbations () for attention layers, enabling flexible per-input adaptation, while static singular-value offsets on MLP layers ensure cross-style stability. To better handle semantic confusions across styles, we design StyleNCE as part of Hystar, an optimal-transport-weighted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
