ULTra: Unveiling Latent Token Interpretability in Transformer-Based Understanding and Segmentation

Hesam Hosseini; Ghazal Hosseini Mighan; Amirabbas Afzali; Sajjad Amini; Amir Houmansadr

arXiv:2411.12589·cs.CV·January 6, 2026

ULTra: Unveiling Latent Token Interpretability in Transformer-Based Understanding and Segmentation

Hesam Hosseini, Ghazal Hosseini Mighan, Amirabbas Afzali, Sajjad Amini, Amir Houmansadr

PDF

Open Access

TL;DR

ULTra is a novel framework that interprets transformer embeddings to reveal semantic patterns, enabling unsupervised segmentation and model explanation without fine-tuning, achieving state-of-the-art results.

Contribution

ULTra introduces a new unsupervised interpretability framework for transformers that uncovers semantic structures and improves segmentation performance without model fine-tuning.

Findings

01

Achieves state-of-the-art unsupervised semantic segmentation performance.

02

Effectively interprets latent token representations in CV and NLP models.

03

Demonstrates broad applicability in model explanation and interpretability tasks.

Abstract

Transformers have revolutionized Computer Vision (CV) through self-attention mechanisms. However, their complexity makes latent token representations difficult to interpret. We introduce ULTra, a framework for interpreting Transformer embeddings and uncovering meaningful semantic patterns within them. ULTra enables unsupervised semantic segmentation using pre-trained models without requiring fine-tuning. Additionally, we propose a self-supervised training approach that refines segmentation performance by learning an external transformation matrix without modifying the underlying model. Our method achieves state-of-the-art performance in unsupervised semantic segmentation, outperforming existing segmentation methods. Furthermore, we validate ULTra for model interpretation on both synthetic and real-world scenarios, including Object Selection and interpretable text summarization using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsAttention Is All You Need · Dense Connections · Label Smoothing · Adam · Residual Connection · Byte Pair Encoding · Linear Layer · Softmax · Position-Wise Feed-Forward Layer · Layer Normalization