Real-World En Call Center Transcripts Dataset with PII Redaction
Ha Dao, Gaurav Chawla, Raghu Banda, Caleb DeLeeuw

TL;DR
This paper presents CallCenterEN, a large-scale, PII-redacted dataset of real-world English call center transcripts with audio, supporting research in customer support AI while ensuring privacy compliance.
Contribution
The paper introduces the largest open-source call center transcript dataset with PII redaction, covering diverse accents and supporting non-commercial research.
Findings
Largest open-source call center transcript dataset to date
Includes diverse accents from India, Philippines, US
Ensures PII removal for privacy compliance
Abstract
We introduce CallCenterEN, a large-scale (91,706 conversations, corresponding to 10448 audio hours), real-world English call center transcript dataset designed to support research and development in customer support and sales AI systems. This is the largest release to-date of open source call center transcript data of this kind. The dataset includes inbound and outbound calls between agents and customers, with accents from India, the Philippines and the United States. The dataset includes high-quality, PII-redacted human-readable transcriptions. All personally identifiable information (PII) has been rigorously removed to ensure compliance with global data protection laws. The audio is not included in the public release due to biometric privacy concerns. Given the scarcity of publicly available real-world call center datasets, CallCenterEN fills a critical gap in the landscape of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- AIxBlock/92k-real-world-call-center-scripts-englishdataset· 459 dl459 dl
- shadye-6/92k-real-world-call-center-scripts-englishdataset· 7 dl7 dl
- yevgeniy03/92k-real-world-call-center-scripts-englishdataset· 9 dl9 dl
- waterandwood/92k-real-world-call-center-scripts-englishdataset· 3 dl3 dl
- 1159rp/92k-real-world-call-center-scripts-englishdataset· 29 dl29 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
