Explicit Dropout: Deterministic Regularization for Transformer Architectures
Vidhi Agrawal, Illia Oleksiienko, Alexandros Iosifidis

TL;DR
This paper introduces explicit dropout, a deterministic regularization method for Transformer models that replaces stochastic masking with explicit loss-based regularization, improving interpretability and control.
Contribution
It formulates dropout as an explicit additive regularizer for Transformers, enabling fine-grained regularization control without stochasticity.
Findings
Explicit dropout matches or outperforms traditional stochastic dropout.
Consistent performance gains across image, audio, and temporal tasks.
Stable and controllable regularization demonstrated through ablation studies.
Abstract
Dropout is a widely used regularization technique in deep learning, but its effects are typically realized through stochastic masking rather than explicit optimization objectives. We propose a deterministic formulation that expresses dropout as an additive regularizer directly incorporated into the training loss. The framework derives explicit regularization terms for Transformer architectures, covering attention query, key, value, and feed-forward components with independently controllable strengths. This formulation removes reliance on stochastic perturbations while providing clearer and fine-grained control over regularization strength. Experiments across image classification, temporal action detection, and audio classification show that explicit dropout matches or outperforms conventional implicit methods, with consistent gains when applied to attention and feed-forward network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
