TL;DR
This paper analyzes different strategies for scaling convolutional neural networks, revealing that a simple compound approach focusing on width scaling offers better efficiency and similar accuracy compared to traditional methods.
Contribution
The paper introduces a fast compound scaling method that primarily scales width, resulting in more efficient models with lower activation growth and comparable accuracy.
Findings
Scaling strategies impact model parameters and runtime differently.
Many scaling methods achieve similar accuracy with different resource costs.
The proposed method achieves near square-root growth in activations, improving efficiency.
Abstract
In this work we analyze strategies for convolutional neural network scaling; that is, the process of scaling a base convolutional network to endow it with greater computational complexity and consequently representational power. Example scaling strategies may include increasing model width, depth, resolution, etc. While various scaling strategies exist, their tradeoffs are not fully understood. Existing analysis typically focuses on the interplay of accuracy and flops (floating point operations). Yet, as we demonstrate, various scaling strategies affect model parameters, activations, and consequently actual runtime quite differently. In our experiments we show the surprising result that numerous scaling strategies yield networks with similar accuracy but with widely varying properties. This leads us to propose a simple fast compound scaling strategy that encourages primarily scaling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗kadirnar/timm_model_listmodel· ♡ 1♡ 1
- 🤗timm/regnetz_040.ra3_in1kmodel· 79 dl· ♡ 179 dl♡ 1
- 🤗timm/regnetz_040_h.ra3_in1kmodel· 71 dl· ♡ 171 dl♡ 1
- 🤗timm/regnetz_b16.ra3_in1kmodel· 134 dl· ♡ 1134 dl♡ 1
- 🤗timm/regnetz_c16.ra3_in1kmodel· 100 dl· ♡ 1100 dl♡ 1
- 🤗timm/regnetz_c16_evos.ch_in1kmodel· 78 dl· ♡ 178 dl♡ 1
- 🤗timm/regnetz_d8.ra3_in1kmodel· 203 dl· ♡ 1203 dl♡ 1
- 🤗timm/regnetz_d8_evos.ch_in1kmodel· 158 dl· ♡ 1158 dl♡ 1
- 🤗timm/regnetz_d32.ra3_in1kmodel· 80 dl· ♡ 180 dl♡ 1
- 🤗timm/regnetz_e8.ra3_in1kmodel· 623 dl· ♡ 1623 dl♡ 1
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
