Improving the Behaviour of Vision Transformers with Token-consistent Stochastic Layers
Nikola Popovic, Danda Pani Paudel, Thomas Probst, Luc Van Gool

TL;DR
This paper proposes token-consistent stochastic layers in vision transformers that enhance calibration, robustness, and privacy without significantly affecting performance, by encouraging reliance on token topology rather than values.
Contribution
Introduction of token-consistent stochastic layers in vision transformers that improve robustness and privacy while maintaining performance.
Findings
Improved network calibration and robustness.
Enhanced feature privacy in vision transformers.
Effective across multiple applications.
Abstract
We introduce token-consistent stochastic layers in vision transformers, without causing any severe drop in performance. The added stochasticity improves network calibration, robustness and strengthens privacy. We use linear layers with token-consistent stochastic parameters inside the multilayer perceptron blocks, without altering the architecture of the transformer. The stochastic parameters are sampled from the uniform distribution, both during training and inference. The applied linear operations preserve the topological structure, formed by the set of tokens passing through the shared multilayer perceptron. This operation encourages the learning of the recognition task to rely on the topological structures of the tokens, instead of their values, which in turn offers the desired robustness and privacy of the visual features. The effectiveness of the token-consistent stochasticity is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · CCD and CMOS Imaging Sensors · Neural dynamics and brain function
