A2P-MANN: Adaptive Attention Inference Hops Pruned Memory-Augmented Neural Networks
Mohsen Ahmadzadeh, Mehdi Kamal, Ali Afzali-Kusha, Massoud Pedram

TL;DR
This paper introduces A2P-MANN, an adaptive, pruned memory-augmented neural network that reduces computational costs by dynamically determining attention hops and pruning weights, achieving significant efficiency gains in question-answering tasks.
Contribution
The paper presents a novel adaptive approach for attention inference and weight pruning in MANNs, significantly reducing computations with minimal accuracy loss.
Findings
Over 42% fewer computations on average compared to baseline MANN.
Up to 68% reduction in computation when combined with zero-skipping.
Up to 43% runtime reduction on CPU and GPU platforms.
Abstract
In this work, to limit the number of required attention inference hops in memory-augmented neural networks, we propose an online adaptive approach called A2P-MANN. By exploiting a small neural network classifier, an adequate number of attention inference hops for the input query is determined. The technique results in elimination of a large number of unnecessary computations in extracting the correct answer. In addition, to further lower computations in A2P-MANN, we suggest pruning weights of the final FC (fully-connected) layers. To this end, two pruning approaches, one with negligible accuracy loss and the other with controllable loss on the final accuracy, are developed. The efficacy of the technique is assessed by using the twenty question-answering (QA) tasks of bAbI dataset. The analytical assessment reveals, on average, more than 42% fewer computations compared to the baseline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
MethodsPruning
