ODE-ViT: Plug & Play Attention Layer from the Generalization of the ViT as an Ordinary Differential Equation
Carlos Boned Riera, David Romero Sanchez, Oriol Ramos Terrades

TL;DR
This paper introduces ODE-ViT, a novel Vision Transformer reformulated as an ODE system, achieving stable, interpretable, and efficient classification with fewer parameters, and proposes a teacher-student framework for performance enhancement.
Contribution
It presents ODE-ViT, a new ODE-based formulation of Vision Transformers, and a plug-and-play teacher-student training method to improve performance.
Findings
Achieves competitive classification accuracy with fewer parameters.
Surpasses prior ODE-based Transformer models in experiments.
Performance improves by over 10% with the teacher-student framework.
Abstract
In recent years, increasingly large models have achieved outstanding performance across CV tasks. However, these models demand substantial computational resources and storage, and their growing complexity limits our understanding of how they make decisions. Most of these architectures rely on the attention mechanism within Transformer-based designs. Building upon the connection between residual neural networks and ordinary differential equations (ODEs), we introduce ODE-ViT, a Vision Transformer reformulated as an ODE system that satisfies the conditions for well-posed and stable dynamics. Experiments on CIFAR-10 and CIFAR-100 demonstrate that ODE-ViT achieves stable, interpretable, and competitive performance with up to one order of magnitude fewer parameters, surpassing prior ODE-based Transformer approaches in classification tasks. We further propose a plug-and-play teacher-student…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis · Neural Networks and Reservoir Computing
