Incrementally-Computable Neural Networks: Efficient Inference for   Dynamic Inputs

Or Sharir; Anima Anandkumar

arXiv:2307.14988·cs.LG·July 28, 2023

Incrementally-Computable Neural Networks: Efficient Inference for Dynamic Inputs

Or Sharir, Anima Anandkumar

PDF

Open Access

TL;DR

This paper introduces an incremental inference method for neural networks, especially transformers, that reuses calculations to efficiently process dynamic inputs with minimal recomputation, significantly reducing computational costs.

Contribution

It proposes a novel approach using vector quantization to enable incremental computation in transformers, improving efficiency without sacrificing accuracy.

Findings

01

Achieved 12.1X median reduction in operations for document edits

02

Maintained comparable accuracy on classification tasks

03

Demonstrated effectiveness on pre-trained language models

Abstract

Deep learning often faces the challenge of efficiently processing dynamic inputs, such as sensor data or user inputs. For example, an AI writing assistant is required to update its suggestions in real time as a document is edited. Re-running the model each time is expensive, even with compression techniques like knowledge distillation, pruning, or quantization. Instead, we take an incremental computing approach, looking to reuse calculations as the inputs change. However, the dense connectivity of conventional architectures poses a major obstacle to incremental computation, as even minor input changes cascade through the network and restrict information reuse. To address this, we use vector quantization to discretize intermediate values in the network, which filters out noisy and unnecessary modifications to hidden neurons, facilitating the reuse of their values. We apply this approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Ferroelectric and Negative Capacitance Devices · Advanced Neural Network Applications