Real-Time Voicemail Detection in Telephony Audio Using Temporal Speech Activity Features

Kumar Saurav

arXiv:2604.09675·cs.SD·April 14, 2026

Real-Time Voicemail Detection in Telephony Audio Using Temporal Speech Activity Features

Kumar Saurav

PDF

TL;DR

This paper introduces a fast, lightweight system that uses temporal speech activity features from a neural VAD to accurately detect voicemails in telephony calls in real time, suitable for large-scale deployment.

Contribution

The authors propose a novel, efficient approach leveraging temporal speech activity features and a shallow classifier, achieving high accuracy without complex processing or transcription.

Findings

01

Achieved 96.1% overall accuracy in voicemail detection across diverse datasets.

02

Maintained low false positive (0.3%) and false negative (1.3%) rates in production.

03

End-to-end inference runs in 46 ms on a dual-core CPU, supporting 380+ concurrent calls.

Abstract

Outbound AI calling systems must distinguish voicemail greetings from live human answers in real time to avoid wasted agent interactions and dropped calls. We present a lightweight approach that extracts 15 temporal features from the speech activity pattern of a pre-trained neural voice activity detector (VAD), then classifies with a shallow tree-based ensemble. Across two evaluation sets totaling 764 telephony recordings, the system achieves a combined 96.1% accuracy (734/764), with 99.3% (139/140) on an expert-labeled test set and 95.4% (595/624) on a held-out production set. In production validation over 77,000 calls, it maintained a 0.3% false positive rate and 1.3% false negative rate. End-to-end inference completes in 46 ms on a commodity dual-core CPU with no GPU, supporting 380+ concurrent WebSocket calls. In our search over 3,780 model, feature, and threshold combinations,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.