SimGym: Traffic-Grounded Browser Agents for Offline A/B Testing in E-Commerce
Alberto Castelo, Zahra Zanjani Foumani, Ailin Fan, Keat Yang Koay, Vibhor Malik, Yuanzheng Zhu, Han Li, Meysam Feghhi, Ronie Uliana, Shuang Xie, Zhaoyu Zhang, Angelo Ocana Martins, Mingyu Zhao, Francis Pelland, Jonathan Faerman, Nikolas LeBlanc, Aaron Glazer, Andrew McNamara

TL;DR
SimGym is a scalable system that uses traffic-grounded synthetic agents powered by large language models to perform rapid offline A/B testing in e-commerce, significantly reducing testing time from weeks to under an hour.
Contribution
We introduce SimGym, a novel traffic-grounded simulation platform that enables fast, accurate offline A/B testing for e-commerce UI changes using LLM-powered synthetic buyers.
Findings
SimGym achieves state-of-the-art alignment with real user outcomes.
It reduces A/B testing cycles from weeks to under an hour.
Validation on a major e-commerce platform confirms effectiveness.
Abstract
A/B testing remains the gold standard for evaluating e-commerce UI changes, yet it diverts traffic, takes weeks to reach significance, and risks harming user experience. We introduce SimGym, a scalable system for rapid offline A/B testing using traffic-grounded synthetic buyers powered by Large Language Model agents operating in a live browser. SimGym extracts per-shop buyer profiles and intents from production interaction data, identifies distinct behavioral archetypes, and simulates cohort-weighted sessions across control and treatment storefronts. We validate SimGym against real human outcomes from real UI changes on a major e-commerce platform under confounder control. Even without alignment post training, SimGym agents achieve state of the art alignment with observed outcome shifts and reduces experiment cycles from weeks to under an hour , enabling rapid experimentation without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Spam and Phishing Detection · Human Mobility and Location-Based Analysis
