EL-VIT: Probing Vision Transformer with Interactive Visualization
Hong Zhou, Rui Zhang, Peifeng Lai, Chaoran Guo, Yong Wang, Zhida Sun, and Junjie Li

TL;DR
EL-VIT is an interactive visualization system that helps users understand the complex inner workings of Vision Transformers through multi-layered visual explanations and similarity analysis.
Contribution
This paper introduces EL-VIT, a novel interactive visual analytics tool designed specifically to probe and interpret Vision Transformer models.
Findings
EL-VIT effectively clarifies ViT's architecture and operations.
Users can better understand ViT through the visualization views.
EL-VIT demonstrates usability in practical scenarios.
Abstract
Nowadays, Vision Transformer (ViT) is widely utilized in various computer vision tasks, owing to its unique self-attention mechanism. However, the model architecture of ViT is complex and often challenging to comprehend, leading to a steep learning curve. ViT developers and users frequently encounter difficulties in interpreting its inner workings. Therefore, a visualization system is needed to assist ViT users in understanding its functionality. This paper introduces EL-VIT, an interactive visual analytics system designed to probe the Vision Transformer and facilitate a better understanding of its operations. The system consists of four layers of visualization views. The first three layers include model overview, knowledge background graph, and model detail view. These three layers elucidate the operation process of ViT from three perspectives: the overall model architecture, detailed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Online Learning and Analytics · Image and Video Quality Assessment
MethodsVisual Analytics · Attention Is All You Need · Absolute Position Encodings · Layer Normalization · Label Smoothing · Residual Connection · Dropout · Linear Layer · Byte Pair Encoding · Adam
