Mixture of Nested Experts: Adaptive Processing of Visual Tokens

Gagan Jain; Nidhi Hegde; Aditya Kusupati; Arsha Nagrani; Shyamal Buch,; Prateek Jain; Anurag Arnab; Sujoy Paul

arXiv:2407.19985·cs.CV·July 31, 2024·2 cites

Mixture of Nested Experts: Adaptive Processing of Visual Tokens

Gagan Jain, Nidhi Hegde, Aditya Kusupati, Arsha Nagrani, Shyamal Buch,, Prateek Jain, Anurag Arnab, Sujoy Paul

PDF

Open Access 1 Repo

TL;DR

The paper introduces Mixture of Nested Experts (MoNE), a dynamic, nested expert framework that reduces computational costs in visual processing by adaptively prioritizing tokens, achieving similar accuracy with over twice the efficiency.

Contribution

MoNE presents a novel nested expert structure that adaptively processes tokens based on compute budgets, improving efficiency without sacrificing performance.

Findings

01

Over two-fold reduction in inference compute time.

02

Maintains strong performance across different compute budgets.

03

Validated on image and video datasets like ImageNet-21K, Kinetics400, and Something-Something-v2.

Abstract

The visual medium (images and videos) naturally contains a large amount of information redundancy, thereby providing a great opportunity for leveraging efficiency in processing. While Vision Transformer (ViT) based models scale effectively to large data regimes, they fail to capitalize on this inherent redundancy, leading to higher computational costs. Mixture of Experts (MoE) networks demonstrate scalability while maintaining same inference-time costs, but they come with a larger parameter footprint. We present Mixture of Nested Experts (MoNE), which utilizes a nested structure for experts, wherein individual experts fall on an increasing compute-accuracy curve. Given a compute budget, MoNE learns to dynamically choose tokens in a priority order, and thus redundant tokens are processed through cheaper nested experts. Using this framework, we achieve equivalent performance as the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

usryokousha/mone-pytorch
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Visualization and Analytics · Neural Networks and Applications · Urban Planning and Valuation

MethodsAttention Is All You Need · Label Smoothing · Adam · Linear Layer · Byte Pair Encoding · Layer Normalization · Softmax · Position-Wise Feed-Forward Layer · Dense Connections · Multi-Head Attention