Beyond Semantics: Disentangling Information Scope in Sparse Autoencoders for CLIP

Yusung Ro; Jaehyun Choi; Junmo Kim

arXiv:2604.05724·cs.CV·April 8, 2026

Beyond Semantics: Disentangling Information Scope in Sparse Autoencoders for CLIP

Yusung Ro, Jaehyun Choi, Junmo Kim

PDF

TL;DR

This paper introduces the concept of information scope in Sparse Autoencoders for CLIP, distinguishing features by their spatial aggregation level and proposing the Contextual Dependency Score to quantify this property.

Contribution

It presents the novel concept of information scope, a new interpretability dimension, and the CDS metric to analyze how features influence CLIP's predictions.

Findings

01

Features with different scopes have distinct impacts on CLIP's outputs.

02

Some SAE features are stable across spatial perturbations, others are not.

03

Information scope is a key axis for understanding CLIP representations.

Abstract

Sparse Autoencoders (SAEs) have emerged as a powerful tool for interpreting the internal representations of CLIP vision encoders, yet existing analyses largely focus on the semantic meaning of individual features. We introduce information scope as a complementary dimension of interpretability that characterizes how broadly an SAE feature aggregates visual evidence, ranging from localized, patch-specific cues to global, image-level signals. We observe that some SAE features respond consistently across spatial perturbations, while others shift unpredictably with minor input changes, indicating a fundamental distinction in their underlying scope. To quantify this, we propose the Contextual Dependency Score (CDS), which separates positionally stable local scope features from positionally variant global scope features. Our experiments show that features of different information scopes exert…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.