Dynamic Query Selection for Fast Visual Perceiver

Corentin Dancette; Matthieu Cord

arXiv:2205.10873·cs.CV·March 23, 2023·1 cites

Dynamic Query Selection for Fast Visual Perceiver

Corentin Dancette, Matthieu Cord

PDF

Open Access

TL;DR

This paper proposes a method to improve the efficiency of Perceiver models in vision tasks by dynamically reducing the number of query tokens during inference, aiming to decrease complexity without significantly sacrificing accuracy.

Contribution

It introduces a dynamic query selection strategy for Perceiver models to enhance inference speed while maintaining performance, addressing a less-explored aspect of model efficiency.

Findings

01

Reduced inference time with minimal accuracy loss

02

Effective dynamic query selection method demonstrated

03

Maintained competitive performance on vision benchmarks

Abstract

Transformers have been matching deep convolutional networks for vision architectures in recent works. Most work is focused on getting the best results on large-scale benchmarks, and scaling laws seem to be the most successful strategy: bigger models, more data, and longer training result in higher performance. However, the reduction of network complexity and inference time remains under-explored. The Perceiver model offers a solution to this problem: by first performing a Cross-attention with a fixed number Q of latent query tokens, the complexity of the L-layers Transformer network that follows is bounded by O(LQ^2). In this work, we explore how to make Perceivers even more efficient, by reducing the number of queries Q during inference while limiting the accuracy drop.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Dense Connections · Dropout · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Residual Connection · Label Smoothing