Dynamic Query Selection for Fast Visual Perceiver
Corentin Dancette, Matthieu Cord

TL;DR
This paper proposes a method to improve the efficiency of Perceiver models in vision tasks by dynamically reducing the number of query tokens during inference, aiming to decrease complexity without significantly sacrificing accuracy.
Contribution
It introduces a dynamic query selection strategy for Perceiver models to enhance inference speed while maintaining performance, addressing a less-explored aspect of model efficiency.
Findings
Reduced inference time with minimal accuracy loss
Effective dynamic query selection method demonstrated
Maintained competitive performance on vision benchmarks
Abstract
Transformers have been matching deep convolutional networks for vision architectures in recent works. Most work is focused on getting the best results on large-scale benchmarks, and scaling laws seem to be the most successful strategy: bigger models, more data, and longer training result in higher performance. However, the reduction of network complexity and inference time remains under-explored. The Perceiver model offers a solution to this problem: by first performing a Cross-attention with a fixed number Q of latent query tokens, the complexity of the L-layers Transformer network that follows is bounded by O(LQ^2). In this work, we explore how to make Perceivers even more efficient, by reducing the number of queries Q during inference while limiting the accuracy drop.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Dense Connections · Dropout · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Residual Connection · Label Smoothing
