End-to-End Speech to Intent Prediction to improve E-commerce Customer Support Voicebot in Hindi and English
Abhinav Goyal, Anupam Singh, Nikesh Garera

TL;DR
This paper presents an end-to-end speech-to-intent model for bilingual customer support voicebots in Hindi and English, achieving better accuracy and efficiency than traditional pipeline approaches.
Contribution
The paper introduces a novel E2E speech-to-intent system leveraging pre-trained ASR models, reducing complexity and improving performance in bilingual customer support applications.
Findings
E2E model outperforms pipeline by ~27% in F1 score
Effective fine-tuning on small datasets
Simplifies deployment and reduces latency
Abstract
Automation of on-call customer support relies heavily on accurate and efficient speech-to-intent (S2I) systems. Building such systems using multi-component pipelines can pose various challenges because they require large annotated datasets, have higher latency, and have complex deployment. These pipelines are also prone to compounding errors. To overcome these challenges, we discuss an end-to-end (E2E) S2I model for customer support voicebot task in a bilingual setting. We show how we can solve E2E intent classification by leveraging a pre-trained automatic speech recognition (ASR) model with slight modification and fine-tuning on small annotated datasets. Experimental results show that our best E2E model outperforms a conventional pipeline by a relative ~27% on the F1 score.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Topic Modeling
MethodsIs Venmo Customer Support Available 24/7? How to Reach a Real Person
