An Empirical Evaluation of Encoder Architectures for Fast Real-Time Long   Conversational Understanding

Annamalai Senthilnathan; Kristjan Arumae; Mohammed Khalilia,; Zhengzheng Xing; Aaron R. Colak

arXiv:2502.12458·cs.CL·February 19, 2025

An Empirical Evaluation of Encoder Architectures for Fast Real-Time Long Conversational Understanding

Annamalai Senthilnathan, Kristjan Arumae, Mohammed Khalilia,, Zhengzheng Xing, Aaron R. Colak

PDF

Open Access

TL;DR

This paper compares various efficient Transformer variants and CNN-based models for real-time long conversational understanding, finding CNNs to be faster and more memory-efficient while maintaining competitive performance.

Contribution

It provides an empirical evaluation of recent Transformer variants and CNN architectures for long sequence conversational tasks in real-time settings.

Findings

01

CNN models are ~2.6x faster to train

02

CNN inference is ~80% faster

03

CNN models are ~72% more memory efficient

Abstract

Analyzing long text data such as customer call transcripts is a cost-intensive and tedious task. Machine learning methods, namely Transformers, are leveraged to model agent-customer interactions. Unfortunately, Transformers adhere to fixed-length architectures and their self-attention mechanism scales quadratically with input length. Such limitations make it challenging to leverage traditional Transformers for long sequence tasks, such as conversational understanding, especially in real-time use cases. In this paper we explore and evaluate recently proposed efficient Transformer variants (e.g. Performer, Reformer) and a CNN-based architecture for real-time and near real-time long conversational understanding tasks. We show that CNN-based models are dynamic, ~2.6x faster to train, ~80% faster inference and ~72% more memory efficient compared to Transformers on average. Additionally, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems