Attribute-Aware Controlled Product Generation with LLMs for E-commerce
Virginia Negri, V\'ictor Mart\'inez G\'omez, Sergio A. Balanya, Subburam Rajaram

TL;DR
This paper introduces a systematic method using Large Language Models to generate high-quality synthetic e-commerce product data with controlled attributes, enhancing dataset quality for improved product information extraction.
Contribution
The paper presents a novel attribute-aware controlled data generation framework using LLMs, enabling high-quality synthetic product data creation for e-commerce applications.
Findings
Synthetic data achieves 60.5% accuracy on MAVE dataset, comparable to real data at 60.8%.
Human evaluation shows 99.6% naturalness and 96.5% valid attributes in synthetic products.
Hybrid synthetic and real data improve accuracy to 68.8%.
Abstract
Product information extraction is crucial for e-commerce services, but obtaining high-quality labeled datasets remains challenging. We present a systematic approach for generating synthetic e-commerce product data using Large Language Models (LLMs), introducing a controlled modification framework with three strategies: attribute-preserving modification, controlled negative example generation, and systematic attribute removal. Using a state-of-the-art LLM with attribute-aware prompts, we enforce store constraints while maintaining product coherence. Human evaluation of 2000 synthetic products demonstrates high effectiveness, with 99.6% rated as natural, 96.5% containing valid attribute values, and over 90% showing consistent attribute usage. On the public MAVE dataset, our synthetic data achieves 60.5% accuracy, performing on par with real training data (60.8%) and significantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsText and Document Classification Technologies · Sentiment Analysis and Opinion Mining · Recommender Systems and Techniques
