A combined approach to the analysis of speech conversations in a contact center domain
Andrea Brunello, Enrico Marzano, Angelo Montanari, Guido Sciavicco

TL;DR
This paper presents a comprehensive speech analytics approach for contact centers, including speech-to-text conversion, semantic tagging, and classification, demonstrating improved performance and interpretability in analyzing customer-agent conversations.
Contribution
It introduces an integrated methodology combining speech recognition, semantic tagging, and decision tree classification tailored for contact center data analysis.
Findings
Kaldi-based speech-to-text outperforms Google Cloud API in this context
Combining rule-based and machine learning tagging improves accuracy
J48S decision tree offers competitive performance with high interpretability
Abstract
The ever more accurate search for deep analysis in customer data is a really strong technological trend nowadays, quite appealing to both private and public companies. This is particularly true in the contact center domain, where speech analytics is an extremely powerful methodology for gaining insights from unstructured data, coming from customer and human agent conversations. In this work, we describe an experimentation with a speech analytics process for an Italian contact center, that deals with call recordings extracted from inbound or outbound flows. First, we illustrate in detail the development of an in-house speech-to-text solution, based on Kaldi framework, and evaluate its performance (and compare it to Google Cloud Speech API). Then, we evaluate and compare different approaches to the semantic tagging of call transcripts, ranging from classic regular expressions to machine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Speech Recognition and Synthesis · Natural Language Processing Techniques
