Dynamic Perceiver for Efficient Visual Recognition
Yizeng Han, Dongchen Han, Zeyu Liu, Yulin Wang, Xuran Pan, Yifan Pu,, Chao Deng, Junlan Feng, Shiji Song, Gao Huang

TL;DR
The paper introduces Dynamic Perceiver, a dual-branch architecture that enhances early exiting in deep networks by decoupling feature extraction from classification, leading to improved inference efficiency across various tasks and backbones.
Contribution
It proposes a novel dual-branch architecture with bi-directional cross-attention for early exiting, overcoming limitations of linear classifiers at intermediate layers.
Findings
Significant improvement in inference efficiency across multiple tasks.
Outperforms existing methods on CPU and GPU platforms.
Versatile framework adaptable to various architectures.
Abstract
Early exiting has become a promising approach to improving the inference efficiency of deep networks. By structuring models with multiple classifiers (exits), predictions for ``easy'' samples can be generated at earlier exits, negating the need for executing deeper layers. Current multi-exit networks typically implement linear classifiers at intermediate layers, compelling low-level features to encapsulate high-level semantics. This sub-optimal design invariably undermines the performance of later exits. In this paper, we propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task with a novel dual-branch architecture. A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks. Bi-directional cross-attention layers are established to progressively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Dynamic Perceiver for Efficient Visual Recognition· youtube
Taxonomy
TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Domain Adaptation and Few-Shot Learning
