DeepCoT: Deep Continual Transformers for Real-Time Inference on Data Streams
Gin\'es Carreto Pic\'on, Peng Yuan Zhou, Qi Zhang, and Alexandros Iosifidis

TL;DR
DeepCoT introduces a novel attention mechanism for deep Transformers that reduces redundancy and computational cost in real-time data stream inference, maintaining performance with significantly faster processing.
Contribution
It proposes DeepCoT, a redundancy-free encoder attention mechanism applicable to deep models, enabling efficient real-time inference on resource-limited devices.
Findings
DeepCoT achieves up to 100x faster inference than previous models.
It maintains comparable accuracy to non-continual Transformers.
Experiments on audio, video, and text streams validate its effectiveness.
Abstract
Transformer-based models have dramatically increased their size and parameter count to tackle increasingly complex tasks. At the same time, there is a growing demand for high performance, low-latency inference on devices with limited resources. In particular, stream data inference is typically performed over a sliding temporal window, leading to highly redundant computations. While the recent Continual Transformers started addressing this issue, they can be effectively used only in shallow models, which limits their scope and generalization power. In this paper, we propose the Deep Continual Transformer (DeepCoT), a redundancy-free encoder attention mechanism that can be applied over existing deep encoder architectures with minimal changes. In our experiments over audio, video, and text streams, we show that DeepCoTs retain comparative performance to their non-continual baselines while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
