Valley3: Scaling Omni Foundation Models for E-commerce
Zeyu Chen, Guanghao Zhou, Qixiang Yin, Ziwang Zhao, Huanjin Yao, Pengjiu Xia, Min Yang, Cen Chen, Minghui Qiu

TL;DR
Valley3 is a comprehensive omni multimodal large language model designed for diverse e-commerce tasks, integrating multilingual audio, cross-modal understanding, and reasoning capabilities, with enhanced search and reasoning modes.
Contribution
The paper introduces Valley3, a novel omni e-commerce LLM with native multilingual audio support, multi-stage pre-training, and advanced reasoning and search functionalities.
Findings
Valley3 outperforms strong baselines on proprietary and open-source e-commerce benchmarks.
The model demonstrates effective long-context reasoning and multi-modal understanding.
Post-training improves Valley3's reasoning depth and efficiency.
Abstract
In this work, we present Valley3, an omni multimodal large language model (MLLM) developed for diverse global e-commerce tasks, with unified understanding and reasoning capabilities across text, images, video, and audio. A key feature of Valley3 is its native multilingual audio capability for e-commerce, developed by extending vision-language models to better support crucial audio-visual tasks, particularly in short-video scenarios. To achieve this, we carefully design a four-stage omni e-commerce continued pre-training pipeline, through which Valley3 progressively acquires audio understanding, cross-modal instruction-following, e-commerce domain knowledge, and long-context reasoning capabilities, ultimately evolving into an omni model for diverse e-commerce scenarios. Then, we further improve Valley3 through post-training to encourage long-chain reasoning with controllable reasoning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
