AdaNCA: Neural Cellular Automata As Adaptors For More Robust Vision   Transformer

Yitao Xu; Tong Zhang; Sabine S\"usstrunk

arXiv:2406.08298·cs.CV·November 22, 2024

AdaNCA: Neural Cellular Automata As Adaptors For More Robust Vision Transformer

Yitao Xu, Tong Zhang, Sabine S\"usstrunk

PDF

Open Access 1 Video

TL;DR

AdaNCA enhances Vision Transformers by integrating Neural Cellular Automata as adaptable modules, significantly improving robustness against adversarial and out-of-distribution inputs with minimal parameter increase.

Contribution

This work introduces AdaNCA, a novel plug-and-play NCA-based adaptor for ViTs that boosts robustness and performance with efficient interaction learning and optimal insertion strategies.

Findings

01

Over 10% accuracy improvement under adversarial attacks on ImageNet1K

02

Consistent robustness gains across eight benchmarks and four ViT architectures

03

Less than 3% increase in model parameters

Abstract

Vision Transformers (ViTs) demonstrate remarkable performance in image classification through visual-token interaction learning, particularly when equipped with local information via region attention or convolutions. Although such architectures improve the feature aggregation from different granularities, they often fail to contribute to the robustness of the networks. Neural Cellular Automata (NCA) enables the modeling of global visual-token representations through local interactions, with its training strategies and architecture design conferring strong generalization ability and robustness against noisy input. In this paper, we propose Adaptor Neural Cellular Automata (AdaNCA) for Vision Transformers that uses NCA as plug-and-play adaptors between ViT layers, thus enhancing ViT's performance and robustness against adversarial samples as well as out-of-distribution inputs. To overcome…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

AdanCA: Neural Cellular Automata As Adaptors For More Robust Vision Transformer· slideslive

Taxonomy

TopicsImage Processing Techniques and Applications · CCD and CMOS Imaging Sensors

MethodsResidual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Adam · Attention Is All You Need · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer