From Synthetic to Native: Benchmarking Multilingual Intent Classification in Logistics Customer Service
Haoyu He, Jinyu Zhuang, Haoran Chu, Shuhang Yu, J, T AI Group, Hao Wang, and Kunpeng Han

TL;DR
This paper introduces a new multilingual intent classification benchmark based on real customer service logs, revealing that synthetic data overestimates model performance and highlighting the importance of native data for realistic evaluation.
Contribution
It provides a large, real-world multilingual dataset with native and translated test sets, enabling more accurate benchmarking of intent classification models in logistics.
Findings
Translated test sets overestimate performance on native queries
Native data reveals challenges in long-tail intent classification
Benchmark highlights the gap between synthetic and real-world multilingual evaluation
Abstract
Multilingual intent classification is central to customer-service systems on global logistics platforms, where models must process noisy user queries across languages and hierarchical label spaces. Yet most existing multilingual benchmarks rely on machine-translated text, which is typically cleaner and more standardized than native customer requests and can therefore overestimate real-world robustness. We present a public benchmark for hierarchical multilingual intent classification constructed from real logistics customer-service logs. The dataset contains approximately 30K de-identified, stand-alone user queries curated from 600K historical records through filtering, LLM-assisted quality control, and human verification, and is organized into a two-level taxonomy with 13 parent and 17 leaf intents. English, Spanish, and Arabic are included as seen languages, while Indonesian, Chinese,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Advanced Graph Neural Networks · Machine Learning and Data Classification
