Knowledge boosting during low-latency inference

Vidya Srinivas; Malek Itani; Tuochao Chen; Sefik Emre Eskimez; Takuya; Yoshioka; Shyamnath Gollakota

arXiv:2407.11055·cs.LG·July 26, 2024

Knowledge boosting during low-latency inference

Vidya Srinivas, Malek Itani, Tuochao Chen, Sefik Emre Eskimez, Takuya, Yoshioka, Shyamnath Gollakota

PDF

Open Access 1 Repo

TL;DR

This paper introduces knowledge boosting, a technique enabling small models to benefit from large models' knowledge during low-latency streaming inference despite communication delays, improving performance in speech tasks.

Contribution

The paper proposes a novel method called knowledge boosting that allows large models to enhance small model inference in real-time streaming applications despite delays.

Findings

01

Larger performance gains when the gap between small and large models is wide.

02

Effective for speech separation and enhancement tasks with delays up to 48 ms.

03

Demonstrates feasibility of large-small model collaboration in low-latency scenarios.

Abstract

Models for low-latency, streaming applications could benefit from the knowledge capacity of larger models, but edge devices cannot run these models due to resource constraints. A possible solution is to transfer hints during inference from a large model running remotely to a small model running on-device. However, this incurs a communication delay that breaks real-time requirements and does not guarantee that both models will operate on the same data at the same time. We propose knowledge boosting, a novel technique that allows a large model to operate on time-delayed input during inference, while still boosting small model performance. Using a streaming neural network that processes 8 ms chunks, we evaluate different speech separation and enhancement tasks with communication delays of up to six chunks or 48 ms. Our results show larger gains where the performance gap between the small…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vysri/knowledge-boosting
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Neural Networks and Applications · Cell Image Analysis Techniques