In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval
Nina Shvetsova, Anna Kukleva, Bernt Schiele, Hilde Kuehne

TL;DR
This paper introduces In-Style, a novel method for text-video retrieval that leverages style transfer to utilize uncurated web videos and text queries without paired data, enhancing zero-shot retrieval performance.
Contribution
The paper proposes a new setting for text-video retrieval using uncurated, unpaired data and introduces a multi-style contrastive training approach to improve model generalization across datasets.
Findings
Achieves state-of-the-art zero-shot text-video retrieval performance.
Demonstrates effective style transfer from text queries to videos.
Improves generalization by training with multiple text styles.
Abstract
Large-scale noisy web image-text datasets have been proven to be efficient for learning robust vision-language models. However, when transferring them to the task of video retrieval, models still need to be fine-tuned on hand-curated paired text-video data to adapt to the diverse styles of video descriptions. To address this problem without the need for hand-annotated pairs, we propose a new setting, text-video retrieval with uncurated & unpaired data, that during training utilizes only text queries together with uncurated web videos without any paired text-video data. To this end, we propose an approach, In-Style, that learns the style of the text queries and transfers it to uncurated web videos. Moreover, to improve generalization, we show that one model can be trained with multiple text styles. To this end, we introduce a multi-style contrastive training procedure that improves the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Cancer-related molecular mechanisms research · Domain Adaptation and Few-Shot Learning
