Efficient Online Processing with Deep Neural Networks
Lukas Hedegaard

TL;DR
This paper focuses on improving the efficiency of deep neural networks during online inference by introducing Continual Inference Networks (CINs) and structured pruning techniques, reducing computational costs while maintaining accuracy.
Contribution
It proposes Continual Inference Networks (CINs) for online processing and introduces structured pruning adapters for efficient model adaptation and acceleration.
Findings
CINs improve online inference efficiency by an order of magnitude.
Reformulation of 3D CNNs, ST-GCNs, and Transformers into CINs.
Structured pruning adapters outperform fine-tuning in accuracy with fewer weights.
Abstract
The capabilities and adoption of deep neural networks (DNNs) grow at an exhilarating pace: Vision models accurately classify human actions in videos and identify cancerous tissue in medical scans as precisely than human experts; large language models answer wide-ranging questions, generate code, and write prose, becoming the topic of everyday dinner-table conversations. Even though their uses are exhilarating, the continually increasing model sizes and computational complexities have a dark side. The economic cost and negative environmental externalities of training and serving models is in evident disharmony with financial viability and climate action goals. Instead of pursuing yet another increase in predictive performance, this dissertation is dedicated to the improvement of neural network efficiency. Specifically, a core contribution addresses the efficiency aspects during online…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Anomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods
MethodsMulti-Head Attention · Attention Is All You Need · Pruning · Absolute Position Encodings · Linear Layer · Position-Wise Feed-Forward Layer · Layer Normalization · Label Smoothing · Adam · Byte Pair Encoding
