Sparse Double Descent in Vision Transformers: real or phantom threat?

Victor Qu\'etu; Marta Milovanovic; Enzo Tartaglione

arXiv:2307.14253·cs.CV·September 13, 2023

Sparse Double Descent in Vision Transformers: real or phantom threat?

Victor Qu\'etu, Marta Milovanovic, Enzo Tartaglione

PDF

1 Repo

TL;DR

This paper investigates whether Vision Transformers experience the sparse double descent phenomenon and finds that proper regularization can mitigate it, though with trade-offs in model compression.

Contribution

The study demonstrates that optimal regularization can prevent sparse double descent in ViTs, highlighting a practical approach to improve their generalization.

Findings

01

Optimal $\\ell_2$ regularization relieves sparse double descent in ViTs.

02

Proper tuning of regularization sacrifices some model compression.

03

ViTs are not inherently prone to sparse double descent with correct regularization.

Abstract

Vision transformers (ViT) have been of broad interest in recent theoretical and empirical works. They are state-of-the-art thanks to their attention-based approach, which boosts the identification of key features and patterns within images thanks to the capability of avoiding inductive bias, resulting in highly accurate image analysis. Meanwhile, neoteric studies have reported a ``sparse double descent'' phenomenon that can occur in modern deep-learning models, where extremely over-parametrized models can generalize well. This raises practical questions about the optimal size of the model and the quest over finding the best trade-off between sparsity and performance is launched: are Vision Transformers also prone to sparse double descent? Can we find a way to avoid such a phenomenon? Our work tackles the occurrence of sparse double descent on ViTs. Despite some works that have shown…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vgcq/sdd_vit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.