Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression
Denis Kuznedelev, Soroush Tabesh, Kimia Noorbakhsh, Elias Frantar,, Sara Beery, Eldar Kurtic, Dan Alistarh

TL;DR
This paper introduces TACO, a simple and efficient method for compressing large vision models into smaller, specialized models using few-shot learning, enabling faster inference with minimal accuracy loss on specific tasks.
Contribution
The paper presents TACO, a novel few-shot task-aware compression technique that effectively reduces model size and computational cost while maintaining high accuracy on specialized vision tasks.
Findings
TACO achieves up to 20x parameter reduction.
Inference speed increases up to 3x.
Specialized models remain accuracy-competitive.
Abstract
Recent vision architectures and self-supervised training methods enable vision models that are extremely accurate and general, but come with massive parameter and computational costs. In practical settings, such as camera traps, users have limited resources, and may fine-tune a pretrained model on (often limited) data from a small set of specific categories of interest. These users may wish to make use of modern, highly-accurate models, but are often computationally constrained. To address this, we ask: can we quickly compress large generalist models into accurate and efficient specialists? For this, we propose a simple and versatile technique called Few-Shot Task-Aware Compression (TACO). Given a large vision model that is pretrained to be accurate on a broad task, such as classification over ImageNet-22K, TACO produces a smaller model that is accurate on specialized tasks, such as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
MethodsConvNeXt · 1x1 Convolution · Residual Connection · Batch Normalization · Max Pooling · Convolution · Residual Block · *Communicated@Fast*How Do I Communicate to Expedia? · Bottleneck Residual Block · Average Pooling
