Seeing the Abstract: Translating the Abstract Language for Vision   Language Models

Davide Talon; Federico Girella; Ziyue Liu; Marco Cristani; Yiming; Wang

arXiv:2505.03242·cs.CV·May 7, 2025

Seeing the Abstract: Translating the Abstract Language for Vision Language Models

Davide Talon, Federico Girella, Ziyue Liu, Marco Cristani, Yiming, Wang

PDF

Open Access 1 Repo

TL;DR

This paper uncovers the significant presence of abstract language in vision-language models, especially in fashion, and introduces a training-free method to enhance their understanding of abstract concepts, improving retrieval performance.

Contribution

It reveals the importance of abstract language in VLMs and proposes ACT, a novel, training-free, model-agnostic method to better represent abstract concepts in the latent space.

Findings

01

Abstract terms are prevalent and valuable in fashion VLM datasets.

02

Current VLMs lack sufficient abstract language understanding due to training data limitations.

03

ACT improves retrieval performance across various models without additional training.

Abstract

Natural language goes beyond dryly describing visual content. It contains rich abstract concepts to express feeling, creativity and properties that cannot be directly perceived. Yet, current research in Vision Language Models (VLMs) has not shed light on abstract-oriented language. Our research breaks new ground by uncovering its wide presence and under-estimated value, with extensive analysis. Particularly, we focus our investigation on the fashion domain, a highly-representative field with abstract expressions. By analyzing recent large-scale multimodal fashion datasets, we find that abstract terms have a dominant presence, rivaling the concrete ones, providing novel information, and being useful in the retrieval task. However, a critical challenge emerges: current general-purpose or fashion-specific VLMs are pre-trained with databases that lack sufficient abstract words in their text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

davidetalon/fashionact
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Multimodal Machine Learning Applications · Natural Language Processing Techniques

MethodsFocus