Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing
Ariel N. Lee, Sarah Adel Bargal, Janavi Kasera, Stan Sclaroff, Kate, Saenko, Nataniel Ruiz

TL;DR
This paper investigates whether CNNs can be trained to mimic the patch selectivity of ViTs using Patch Mixing data augmentation, leading to improved occlusion handling and out-of-context patch ignoring.
Contribution
The study introduces Patch Mixing as a data augmentation technique to instill ViT-like patch selectivity in CNNs, enhancing their robustness to occlusion.
Findings
CNNs trained with Patch Mixing ignore out-of-context patches better
Patch Mixing improves CNNs' performance on occlusion benchmarks
ViTs' performance remains unchanged with Patch Mixing
Abstract
Vision transformers (ViTs) have significantly changed the computer vision landscape and have periodically exhibited superior performance in vision tasks compared to convolutional neural networks (CNNs). Although the jury is still out on which model type is superior, each has unique inductive biases that shape their learning and generalization performance. For example, ViTs have interesting properties with respect to early layer non-local feature dependence, as well as self-attention mechanisms which enhance learning flexibility, enabling them to ignore out-of-context image information more effectively. We hypothesize that this power to ignore out-of-context information (which we name ), while integrating in-context information in a non-local manner in early layers, allows ViTs to more easily handle occlusion. In this study, our aim is to see whether we can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Industrial Vision Systems and Defect Detection · Visual Attention and Saliency Detection
