TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval
Ioana Croitoru, Simion-Vlad Bogolin, Marius Leordeanu, Hailin Jin,, Andrew Zisserman, Samuel Albanie, Yang Liu

TL;DR
TeachText introduces a novel generalized distillation approach that leverages multiple text encoders for improved text-video retrieval, reducing modalities at test time without performance loss and enhancing dataset noise elimination.
Contribution
The paper pioneers the use of large-scale language pretraining in text-video retrieval and extends the method to reduce modalities at test time.
Findings
Achieves state-of-the-art results on multiple benchmarks.
Effectively reduces modalities at test time without performance loss.
Improves dataset noise elimination in retrieval tasks.
Abstract
In recent years, considerable progress on the task of text-video retrieval has been achieved by leveraging large-scale pretraining on visual and audio datasets to construct powerful video encoders. By contrast, despite the natural symmetry, the design of effective algorithms for exploiting large-scale language pretraining remains under-explored. In this work, we are the first to investigate the design of such algorithms and propose a novel generalized distillation method, TeachText, which leverages complementary cues from multiple text encoders to provide an enhanced supervisory signal to the retrieval model. Moreover, we extend our method to video side modalities and show that we can effectively reduce the number of used modalities at test time without compromising performance. Our approach advances the state of the art on several video retrieval benchmarks by a significant margin and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Cancer-related molecular mechanisms research
