MindFlow: Revolutionizing E-commerce Customer Support with Multimodal LLM Agents
Ming Gong, Xucheng Huang, Chenghan Yang, Xianhan Peng, Haoxin Wang, Yang Liu, Ling Jiang

TL;DR
MindFlow is an open-source multimodal LLM agent designed for e-commerce customer support, significantly enhancing complex query handling, user satisfaction, and operational efficiency through a modular architecture and visual-textual reasoning.
Contribution
It introduces the first open-source multimodal LLM agent for e-commerce, integrating memory, decision-making, and visual-textual reasoning within the CoALA framework.
Findings
93.53% relative improvement in real-world deployment
Enhanced handling of complex, multimodal customer queries
Reduced operational costs and increased user satisfaction
Abstract
Recent advances in large language models (LLMs) have enabled new applications in e-commerce customer service. However, their capabilities remain constrained in complex, multimodal scenarios. We present MindFlow, the first open-source multimodal LLM agent tailored for e-commerce. Built on the CoALA framework, it integrates memory, decision-making, and action modules, and adopts a modular "MLLM-as-Tool" strategy for effect visual-textual reasoning. Evaluated via online A/B testing and simulation-based ablation, MindFlow demonstrates substantial gains in handling complex queries, improving user satisfaction, and reducing operational costs, with a 93.53% relative improvement observed in real-world deployments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Multimodal Machine Learning Applications · Topic Modeling
