DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs

Donghyun Kim; Byeongho Heo; Dongyoon Han

arXiv:2403.19588·cs.CV·August 8, 2024·2 cites

DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs

Donghyun Kim, Byeongho Heo, Dongyoon Han

PDF

Open Access 3 Repos 7 Models

TL;DR

This paper revitalizes DenseNets by refining training methods and architecture, demonstrating they can outperform modern models like Swin Transformer and ConvNeXt on ImageNet-1K and other tasks.

Contribution

It introduces improved training recipes and architectural adjustments that significantly enhance DenseNets' performance, surpassing recent state-of-the-art models.

Findings

01

DenseNets outperform Swin Transformer, ConvNeXt, and DeiT-III.

02

Models achieve near state-of-the-art results on ImageNet-1K.

03

Empirical analysis favors concatenation over additive shortcuts.

Abstract

This paper revives Densely Connected Convolutional Networks (DenseNets) and reveals the underrated effectiveness over predominant ResNet-style architectures. We believe DenseNets' potential was overlooked due to untouched training methods and traditional design elements not fully revealing their capabilities. Our pilot study shows dense connections through concatenation are strong, demonstrating that DenseNets can be revitalized to compete with modern architectures. We methodically refine suboptimal components - architectural adjustments, block redesign, and improved training recipes towards widening DenseNets and boosting memory efficiency while keeping concatenation shortcuts. Our models, employing simple architectural elements, ultimately surpass Swin Transformer, ConvNeXt, and DeiT-III - key architectures in the residual learning lineage. Furthermore, our models exhibit near…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsAttention Is All You Need · RDNet · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Softmax · Dropout · Multi-Head Attention · Dense Connections