Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking
Mingxin Li, Yanzhao Zhang, Dingkun Long, Keqin Chen, Sibo Song, Shuai Bai, Zhibo Yang, Pengjun Xie, An Yang, Dayiheng Liu, Jingren Zhou, Junyang Lin

TL;DR
This paper introduces the Qwen3-VL-Embedding and Qwen3-VL-Reranker models, providing a unified, multilingual framework for high-precision multimodal retrieval and ranking across diverse data types.
Contribution
The paper presents a novel end-to-end multimodal search pipeline with advanced training strategies, flexible embeddings, and state-of-the-art performance on multiple benchmarks.
Findings
Qwen3-VL-Embedding-8B achieves 77.8 on MMEB-V2, ranking first.
Models support over 30 languages and handle inputs up to 32k tokens.
Empirical results demonstrate superior performance in multimodal retrieval tasks.
Abstract
In this report, we introduce the Qwen3-VL-Embedding and Qwen3-VL-Reranker model series, the latest extensions of the Qwen family built on the Qwen3-VL foundation model. Together, they provide an end-to-end pipeline for high-precision multimodal search by mapping diverse modalities, including text, images, document images, and video, into a unified representation space. The Qwen3-VL-Embedding model employs a multi-stage training paradigm, progressing from large-scale contrastive pre-training to reranking model distillation, to generate semantically rich high-dimensional vectors. It supports Matryoshka Representation Learning, enabling flexible embedding dimensions, and handles inputs up to 32k tokens. Complementing this, Qwen3-VL-Reranker performs fine-grained relevance estimation for query-document pairs using a cross-encoder architecture with cross-attention mechanisms. Both model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Qwen/Qwen3-VL-Embedding-8Bmodel· 703k dl· ♡ 373703k dl♡ 373
- 🤗Qwen/Qwen3-VL-Embedding-2Bmodel· 1.2M dl· ♡ 3621.2M dl♡ 362
- 🤗Qwen/Qwen3-VL-Reranker-8Bmodel· 189k dl· ♡ 135189k dl♡ 135
- 🤗Qwen/Qwen3-VL-Reranker-2Bmodel· 223k dl· ♡ 181223k dl♡ 181
- 🤗RamManavalan/Qwen3-VL-Embedding-8B-FP8model· 1.7k dl· ♡ 41.7k dl♡ 4
- 🤗z9181317/Qwen3-VL-Embedding-8Bmodel· 18 dl18 dl
- 🤗tomaarsen/Qwen3-VL-Embedding-2Bmodel· 269 dl269 dl
- 🤗tomaarsen/Qwen3-VL-Embedding-8Bmodel· 19 dl19 dl
- 🤗tomaarsen/Qwen3-VL-Reranker-2Bmodel· 58 dl58 dl
- 🤗tomaarsen/Qwen3-VL-Reranker-8Bmodel· 37 dl37 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
