Sell More, Play Less: Benchmarking LLM Realistic Selling Skill

Xuanbo Su; Wenhao Hu; Haibo Su; Yunzhang Chen; Le Zhan; Yanqi Yang; Leo Huang

arXiv:2604.07054·cs.CL·April 10, 2026

Sell More, Play Less: Benchmarking LLM Realistic Selling Skill

Xuanbo Su, Wenhao Hu, Haibo Su, Yunzhang Chen, Le Zhan, Yanqi Yang, Leo Huang

PDF

TL;DR

This paper introduces SalesLLM, a comprehensive bilingual benchmark for evaluating large language models in realistic sales dialogue scenarios, including automatic evaluation methods and a new user simulation model.

Contribution

It presents a new benchmark with a large dataset, automatic evaluation pipeline, and a user model to better assess LLMs in sales tasks, addressing limitations of existing benchmarks.

Findings

01

SalesLLM scores strongly correlate with human ratings (Pearson r=0.98).

02

Top LLMs perform comparably to humans in sales dialogues.

03

Less capable LLMs underperform compared to humans.

Abstract

Sales dialogues require multi-turn, goal-directed persuasion under asymmetric incentives, which makes them a challenging setting for large language models (LLMs). Yet existing dialogue benchmarks rarely measure deal progression and outcomes. We introduce SalesLLM benchmark, a bilingual (ZH/EN) benchmark derived from realistic applications covering Financial Services and Consumer Goods, built from 30,074 scripted configurations and 1,805 curated multi-turn scenarios with controllable difficulty and personas. We propose a fully automatic evaluation pipeline that combines (i) an LLM-based rater for sales-process progress,and (ii) fine-tuned BERT classifiers for end-of-dialogue buying intent. To improve simulation fidelity, we train a user model, CustomerLM, with SFT and DPO on 8,000+ crowdworker-involved sales conversations, reducing role inversion from 17.44% (GPT-4o) to 8.8%. SalesLLM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.