EPM-RL: Reinforcement Learning for On-Premise Product Mapping in E-Commerce
Minhyeong Yu, Wonduk Seo

TL;DR
EPM-RL introduces a reinforcement learning framework for accurate, efficient, and private on-premise product mapping in e-commerce, reducing reliance on costly external APIs.
Contribution
The paper presents a novel RL-based approach that distills high-cost reasoning into a trainable in-house model for scalable product mapping.
Findings
EPM-RL outperforms PEFT-only models in accuracy and cost-efficiency.
It offers a better quality-cost trade-off than API-based baselines.
Enables private deployment and reduces operational costs.
Abstract
Product mapping, the task of deciding whether two e-commerce listings refer to the same product, is a core problem for price monitoring and channel visibility. In real marketplaces, however, sellers frequently inject promotional keywords, platform-specific tags, and bundle descriptions into titles, causing the same product to appear under many different names. Recent LLM-based and multi-agent frameworks improve robustness and interpretability on such hard cases, but they often rely on expensive external APIs, repeated retrieval, and complex inference-time orchestration, making large-scale deployment costly and difficult in privacy-sensitive enterprise settings. To address these issues, we present EPM-RL, a reinforcement-learning-based framework for building an accurate and efficient on-premise e-commerce product mapping model. Our central idea is to distill high-cost agentic reasoning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
