ASCENT-ViT: Attention-based Scale-aware Concept Learning Framework for Enhanced Alignment in Vision Transformers
Sanchit Sinha, Guangzhi Xiong, Aidong Zhang

TL;DR
ASCENT-ViT introduces an attention-based, scale-aware concept learning framework for Vision Transformers, enhancing interpretability and predictive accuracy by aligning multiscale features with human-understandable concepts.
Contribution
It proposes a novel scale and position-aware concept learning method that integrates with ViTs, improving interpretability and performance over existing generic explainability modules.
Findings
Improves predictive accuracy on multiple datasets
Provides accurate, robust concept explanations
Enhances interpretability of Vision Transformers
Abstract
As Vision Transformers (ViTs) are increasingly adopted in sensitive vision applications, there is a growing demand for improved interpretability. This has led to efforts to forward-align these models with carefully annotated abstract, human-understandable semantic entities - concepts. Concepts provide global rationales to the model predictions and can be quickly understood/intervened on by domain experts. Most current research focuses on designing model-agnostic, plug-and-play generic concept-based explainability modules that do not incorporate the inner workings of foundation models (e.g., inductive biases, scale invariance, etc.) during training. To alleviate this issue for ViTs, in this paper, we propose ASCENT-ViT, an attention-based, concept learning framework that effectively composes scale and position-aware representations from multiscale feature pyramids and ViT patch…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Video Analysis and Summarization · Robotics and Automated Systems
MethodsSoftmax · Attention Is All You Need
