An Effective, Performant Named Entity Recognition System for Noisy Business Telephone Conversation Transcripts
Xue-Yong Fu, Cheng Chen, Md Tahmid Rahman Laskar, Shashi Bhushan TN,, Simon Corston-Oliver

TL;DR
This paper introduces a robust NER system for noisy business call transcripts, combining fine-tuning of LUKE and distillation to create a performant, real-time model suitable for commercial deployment on CPUs.
Contribution
It proposes a novel training approach using LUKE as a teacher to improve a smaller DistilBERT-based student model for noisy speech transcripts.
Findings
High accuracy achieved on noisy transcripts
Real-time performance on CPU hardware
Effective use of weakly labeled data
Abstract
We present a simple yet effective method to train a named entity recognition (NER) model that operates on business telephone conversation transcripts that contain noise due to the nature of spoken conversation and artifacts of automatic speech recognition. We first fine-tune LUKE, a state-of-the-art Named Entity Recognition (NER) model, on a limited amount of transcripts, then use it as the teacher model to teach a smaller DistilBERT-based student model using a large amount of weakly labeled data and a small amount of human-annotated data. The model achieves high accuracy while also satisfying the practical constraints for inclusion in a commercial telephony product: realtime performance when deployed on cost-effective CPUs rather than GPUs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
