UniECS: Unified Multimodal E-Commerce Search Framework with Gated Cross-modal Fusion

Zihan Liang; Yufei Ma; ZhiPeng Qian; Huangyu Dai; Zihan Wang; Ben Chen; Chenyi Lei; Yuqing Ding; Han Li

arXiv:2508.13843·cs.IR·August 20, 2025

UniECS: Unified Multimodal E-Commerce Search Framework with Gated Cross-modal Fusion

Zihan Liang, Yufei Ma, ZhiPeng Qian, Huangyu Dai, Zihan Wang, Ben Chen, Chenyi Lei, Yuqing Ding, Han Li

PDF

TL;DR

UniECS is a versatile multimodal e-commerce search framework that unifies various retrieval tasks, employs a novel gated fusion encoder, and outperforms existing methods on a new comprehensive benchmark, demonstrating real-world effectiveness.

Contribution

We introduce UniECS, a flexible architecture with a gated multimodal encoder, a comprehensive training strategy, and a new benchmark for unified e-commerce multimodal retrieval.

Findings

01

Outperforms existing methods across four benchmarks.

02

Achieves up to 28% improvement in text-to-image retrieval.

03

Deploys successfully in real-world e-commerce platform, boosting CTR and revenue.

Abstract

Current e-commerce multimodal retrieval systems face two key limitations: they optimize for specific tasks with fixed modality pairings, and lack comprehensive benchmarks for evaluating unified retrieval approaches. To address these challenges, we introduce UniECS, a unified multimodal e-commerce search framework that handles all retrieval scenarios across image, text, and their combinations. Our work makes three key contributions. First, we propose a flexible architecture with a novel gated multimodal encoder that uses adaptive fusion mechanisms. This encoder integrates different modality representations while handling missing modalities. Second, we develop a comprehensive training strategy to optimize learning. It combines cross-modal alignment loss (CMAL), cohesive local alignment loss (CLAL), intra-modal contrastive loss (IMCL), and adaptive loss weighting. Third, we create M-BEER,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.