Speeding Up Image Classifiers with Little Companions
Yang Liu, Kowshik Thopalli, Jayaraman Thiagarajan

TL;DR
This paper introduces a simple two-pass 'Little-Big' approach that uses a lightweight model to handle easy samples and a larger model for difficult ones, significantly reducing computation without sacrificing accuracy.
Contribution
The paper proposes a model-agnostic two-pass algorithm that drastically reduces MACs in image classifiers by focusing large models only on difficult samples.
Findings
Achieves up to 81% MACs reduction across various models.
Maintains accuracy while reducing computation.
Speeds up models by over 60% with minimal accuracy loss.
Abstract
Scaling up neural networks has been a key recipe to the success of large language and vision models. However, in practice, up-scaled models can be disproportionately costly in terms of computations, providing only marginal improvements in performance; for example, EfficientViT-L3-384 achieves <2% improvement on ImageNet-1K accuracy over the base L1-224 model, while requiring more multiply-accumulate operations (MACs). In this paper, we investigate scaling properties of popular families of neural networks for image classification, and find that scaled-up models mostly help with "difficult" samples. Decomposing the samples by difficulty, we develop a simple model-agnostic two-pass Little-Big algorithm that first uses a light-weight "little" model to make predictions of all samples, and only passes the difficult ones for the "big" model to solve. Good little companion achieve…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
(1) The motivation and method of Little-Big is very simple and straightforward. (2) It seems that Little-Big is very easy to implement. In addition, Little-Big is model-agnostic which can be applied to models with different scales and architectures. (3) Little-Big can accelerate a pre-trained model without introducing additional training cost.
(1) Lack of novelty. As the authors say, Little-Big is an embarrassingly simple method, which adopts a large model and a light-weight model for image classification. It's the major advantage but also the major disadvantage of Little-Big. Many previous works share the similar motivation with Little-Big which uses different networks for accelerating, such as early existing and speculative decoding as you mentioned in the paper. While these works mostly include specific and delicate designs. I unde
- The proposed Little-Big algorithm is conceptually straightforward and easy to implement. It requires minimal modifications to existing models and training pipelines. - The paper demonstrates significant MACs reduction across a range of model architectures (CNNs, transformers, hybrids) and scales, suggesting broad applicability. - Experiments are conducted on multiple datasets (ImageNet-1K, ImageNet-ReaL, ImageNet-V2) to evaluate the robustness and generalizability of the method. - The Little-B
# Major - The method seems to rely on finding an optimal threshold T on the test set (Imagenet validation set) to determine which samples are passed to the Big model. This raises concerns about potential overfitting to the validation set and its impact on generalization performance. Results should be provided using a threshold determined on the training or a held-out portion of the validation set to address this concern. - The paper could benefit from a more comprehensive discussion of related w
S1) The proposed method is practical. It is easy to implement and does not require any modification or additional training of existing models. S2) It is widely applicable. For any classification problem, it’s readily available. We could apply it to other tasks as well if we could come up with confidence estimation methods for them. S3) Extensive experimental results show that the proposed method is robustly performing well in the ImageNet classification task.
W1) The proposed approach lacks novelty. The idea of using multiple models with different cost-accuracy tradeoffs is highly common, to name a few, such as speculative decoding for language models and cascade ranking systems for recommendation and information retrieval. W2) The experiments are weak. All the experiments are about ImageNet-1k image classification task, so it is quite uncertain whether this method works well for other tasks as well.
- The paper studies an important topic - efficiency of visual recognition models. - The speedup claimed by the paper is substantial. At a fixed accuracy, their method improves speed by 30%-80% (Figure 1). - The paper is written clearly and is easily understandable. The investigation presented studies the natural questions that arise with threshold tuning for the Little model's confidence. Figure 3 clearly demonstrates how the accuracy and efficiency change as a function of the threshold. - The p
My main concern is the comparison with prior art. As line 437-439 states, "even with tricks that effectively retrained models, many pruning methods are not competitive... with modern baselines... which in essence are better trained ViTs". It seems that the Little-Big method is evaluated using modern architectures and training recipes, whereas other baselines (pruning, etc.) are using older architectures or training recipes. I'm worried that the gains of this method are primarily attributable to
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification
MethodsBalanced Selection
