Semantic Token Reweighting for Interpretable and Controllable Text   Embeddings in CLIP

Eunji Kim; Kyuhong Shim; Simyung Chang; Sungroh Yoon

arXiv:2410.08469·cs.LG·October 17, 2024

Semantic Token Reweighting for Interpretable and Controllable Text Embeddings in CLIP

Eunji Kim, Kyuhong Shim, Simyung Chang, Sungroh Yoon

PDF

Open Access 1 Video

TL;DR

This paper introduces SToRI, a framework that reweights semantic tokens in CLIP's text embeddings to improve interpretability and controllability, demonstrated through experiments on image classification and retrieval.

Contribution

The paper presents SToRI, a novel method for differential semantic token weighting in CLIP, enhancing interpretability and user-controlled emphasis in text embeddings.

Findings

01

Improved interpretability of CLIP text embeddings.

02

Enhanced controllability over semantic emphasis.

03

Better performance in few-shot image classification and retrieval.

Abstract

A text encoder within Vision-Language Models (VLMs) like CLIP plays a crucial role in translating textual input into an embedding space shared with images, thereby facilitating the interpretative analysis of vision tasks through natural language. Despite the varying significance of different textual elements within a sentence depending on the context, efforts to account for variation of importance in constructing text embeddings have been lacking. We propose a framework of Semantic Token Reweighting to build Interpretable text embeddings (SToRI), which incorporates controllability as well. SToRI refines the text encoding process in CLIP by differentially weighting semantic elements based on contextual importance, enabling finer control over emphasis responsive to data-driven insights and user preferences. The efficacy of SToRI is demonstrated through comprehensive experiments on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Semantic Token Reweighting for Interpretable and Controllable Text Embeddings in CLIP· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsContrastive Language-Image Pre-training