EPM-RL: Reinforcement Learning for On-Premise Product Mapping in E-Commerce

Minhyeong Yu; Wonduk Seo

arXiv:2604.23993·cs.CL·April 28, 2026

EPM-RL: Reinforcement Learning for On-Premise Product Mapping in E-Commerce

Minhyeong Yu, Wonduk Seo

PDF

TL;DR

EPM-RL introduces a reinforcement learning framework for accurate, efficient, and private on-premise product mapping in e-commerce, reducing reliance on costly external APIs.

Contribution

The paper presents a novel RL-based approach that distills high-cost reasoning into a trainable in-house model for scalable product mapping.

Findings

01

EPM-RL outperforms PEFT-only models in accuracy and cost-efficiency.

02

It offers a better quality-cost trade-off than API-based baselines.

03

Enables private deployment and reduces operational costs.

Abstract

Product mapping, the task of deciding whether two e-commerce listings refer to the same product, is a core problem for price monitoring and channel visibility. In real marketplaces, however, sellers frequently inject promotional keywords, platform-specific tags, and bundle descriptions into titles, causing the same product to appear under many different names. Recent LLM-based and multi-agent frameworks improve robustness and interpretability on such hard cases, but they often rely on expensive external APIs, repeated retrieval, and complex inference-time orchestration, making large-scale deployment costly and difficult in privacy-sensitive enterprise settings. To address these issues, we present EPM-RL, a reinforcement-learning-based framework for building an accurate and efficient on-premise e-commerce product mapping model. Our central idea is to distill high-cost agentic reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.