Benchmarking and Learning Real-World Customer Service Dialogue

Tianhong Gao; Jundong Shen; Jiapeng Wang; Bei Shi; Ying Ju; Junfeng Yao; Huiyu Yu

arXiv:2510.22143·cs.CL·January 13, 2026

Benchmarking and Learning Real-World Customer Service Dialogue

Tianhong Gao, Jundong Shen, Jiapeng Wang, Bei Shi, Ying Ju, Junfeng Yao, Huiyu Yu

PDF

TL;DR

This paper introduces OlaBench, a comprehensive customer service dialogue benchmark, and OlaMind, a reinforcement learning approach that improves large language models' performance in real-world ICS tasks, bridging offline metrics and deployment success.

Contribution

The paper presents OlaBench for realistic ICS evaluation and OlaMind for reinforcement learning-based model improvement, addressing gaps in existing benchmarks and training pipelines.

Findings

01

OlaMind outperforms GPT-5.2 and Gemini 3 Pro on OlaBench.

02

OlaMind achieves +23.67% issue resolution in online tests.

03

OlaBench evaluates service capability, safety, and latency.

Abstract

Existing benchmarks and training pipelines for industrial intelligent customer service (ICS) remain misaligned with real-world dialogue requirements, overemphasizing verifiable task success while under-measuring subjective service quality and realistic failure modes, leaving a gap between offline gains and deployable dialogue behavior. We close this gap with a benchmark-to-optimization loop: we first introduce OlaBench, an ICS benchmark spanning retrieval-augmented generation, workflow-based systems, and agentic settings, which evaluates service capability, safety, and latency sensitivity; moreover, motivated by OlaBench results showing state-of-the-art LLMs still fall short, we propose OlaMind, which distills reusable reasoning patterns and service strategies from expert dialogues and applies rubric-aware staged exploration--exploitation reinforcement learning to improve model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.