MAGMaR Shared Task System Description: Video Retrieval with OmniEmbed
Jiaqi Samantha Zhan, Crystina Zhang, Shengyao Zhuang, Xueguang Ma, Jimmy Lin

TL;DR
This paper presents a unified multimodal video retrieval system using OmniEmbed, which integrates visual, auditory, and textual data to improve retrieval accuracy on complex, multilingual datasets, achieving top leaderboard performance.
Contribution
We introduce a novel application of OmniEmbed for unified multimodal video retrieval, fine-tuned on MultiVENT 2.0, demonstrating significant performance improvements and open-sourcing the model checkpoint.
Findings
Achieved highest score on MAGMaR shared task leaderboard.
Demonstrated effective multimodal embedding for complex video retrieval.
Improved retrieval performance on multilingual datasets.
Abstract
Effective video retrieval remains challenging due to the complexity of integrating visual, auditory, and textual modalities. In this paper, we explore unified retrieval methods using OmniEmbed, a powerful multimodal embedding model from the Tevatron 2.0 toolkit, in the context of the MAGMaR shared task. Evaluated on the comprehensive MultiVENT 2.0 dataset, OmniEmbed generates unified embeddings for text, images, audio, and video, enabling robust multimodal retrieval. By finetuning OmniEmbed with the combined multimodal data--visual frames, audio tracks, and textual descriptions provided in MultiVENT 2.0, we achieve substantial improvements in complex, multilingual video retrieval tasks. Our submission achieved the highest score on the MAGMaR shared task leaderboard among public submissions as of May 20th, 2025, highlighting the practical effectiveness of our unified multimodal retrieval…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Advanced Image and Video Retrieval Techniques
