AdaNCA: Neural Cellular Automata As Adaptors For More Robust Vision Transformer
Yitao Xu, Tong Zhang, Sabine S\"usstrunk

TL;DR
AdaNCA enhances Vision Transformers by integrating Neural Cellular Automata as adaptable modules, significantly improving robustness against adversarial and out-of-distribution inputs with minimal parameter increase.
Contribution
This work introduces AdaNCA, a novel plug-and-play NCA-based adaptor for ViTs that boosts robustness and performance with efficient interaction learning and optimal insertion strategies.
Findings
Over 10% accuracy improvement under adversarial attacks on ImageNet1K
Consistent robustness gains across eight benchmarks and four ViT architectures
Less than 3% increase in model parameters
Abstract
Vision Transformers (ViTs) demonstrate remarkable performance in image classification through visual-token interaction learning, particularly when equipped with local information via region attention or convolutions. Although such architectures improve the feature aggregation from different granularities, they often fail to contribute to the robustness of the networks. Neural Cellular Automata (NCA) enables the modeling of global visual-token representations through local interactions, with its training strategies and architecture design conferring strong generalization ability and robustness against noisy input. In this paper, we propose Adaptor Neural Cellular Automata (AdaNCA) for Vision Transformers that uses NCA as plug-and-play adaptors between ViT layers, thus enhancing ViT's performance and robustness against adversarial samples as well as out-of-distribution inputs. To overcome…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsImage Processing Techniques and Applications · CCD and CMOS Imaging Sensors
MethodsResidual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Adam · Attention Is All You Need · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer
