UniECS: Unified Multimodal E-Commerce Search Framework with Gated Cross-modal Fusion
Zihan Liang, Yufei Ma, ZhiPeng Qian, Huangyu Dai, Zihan Wang, Ben Chen, Chenyi Lei, Yuqing Ding, Han Li

TL;DR
UniECS is a versatile multimodal e-commerce search framework that unifies various retrieval tasks, employs a novel gated fusion encoder, and outperforms existing methods on a new comprehensive benchmark, demonstrating real-world effectiveness.
Contribution
We introduce UniECS, a flexible architecture with a gated multimodal encoder, a comprehensive training strategy, and a new benchmark for unified e-commerce multimodal retrieval.
Findings
Outperforms existing methods across four benchmarks.
Achieves up to 28% improvement in text-to-image retrieval.
Deploys successfully in real-world e-commerce platform, boosting CTR and revenue.
Abstract
Current e-commerce multimodal retrieval systems face two key limitations: they optimize for specific tasks with fixed modality pairings, and lack comprehensive benchmarks for evaluating unified retrieval approaches. To address these challenges, we introduce UniECS, a unified multimodal e-commerce search framework that handles all retrieval scenarios across image, text, and their combinations. Our work makes three key contributions. First, we propose a flexible architecture with a novel gated multimodal encoder that uses adaptive fusion mechanisms. This encoder integrates different modality representations while handling missing modalities. Second, we develop a comprehensive training strategy to optimize learning. It combines cross-modal alignment loss (CMAL), cohesive local alignment loss (CLAL), intra-modal contrastive loss (IMCL), and adaptive loss weighting. Third, we create M-BEER,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
