MOON: Generative MLLM-based Multimodal Representation Learning for E-commerce Product Understanding
Daoze Zhang, Chenghan Fu, Zhanheng Nie, Jianyu Liu, Wanxian Guan, Yuan Gao, Jun Song, Pengjie Wang, Jian Xu, Bo Zheng

TL;DR
The paper introduces MOON, a generative multimodal large language model designed for e-commerce product understanding, addressing challenges like multimodal alignment, background noise, and lack of benchmarks, with strong zero-shot performance.
Contribution
This work presents the first generative MLLM model for product representation learning, incorporating a Mixture-of-Experts module, semantic region detection, and a new benchmark dataset.
Findings
Demonstrates competitive zero-shot performance on multiple tasks
Effectively models multimodal and aspect-specific content
Shows strong generalization across downstream tasks
Abstract
With the rapid advancement of e-commerce, exploring general representations rather than task-specific ones has attracted increasing research attention. For product understanding, although existing discriminative dual-flow architectures drive progress in this field, they inherently struggle to model the many-to-one alignment between multiple images and texts of products. Therefore, we argue that generative Multimodal Large Language Models (MLLMs) hold significant potential for improving product representation learning. Nevertheless, achieving this goal still remains non-trivial due to several key challenges: the lack of multimodal and aspect-aware modeling modules in typical LLMs; the common presence of background noise in product images; and the absence of a standard benchmark for evaluation. To address these issues, we propose the first generative MLLM-based model named MOON for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Text and Document Classification Technologies
