Same Image, Different Meanings: Toward Retrieval of Context-Dependent Meanings

Ayuto Tsutsumi; Ryosuke Kohita

arXiv:2605.12905·cs.IR·May 14, 2026

Same Image, Different Meanings: Toward Retrieval of Context-Dependent Meanings

Ayuto Tsutsumi, Ryosuke Kohita

PDF

TL;DR

This paper explores how image meanings vary with context, proposing a framework to improve retrieval by considering semantic abstraction levels and narrative grounding.

Contribution

It introduces the L1--L4 framework to organize image semantics by context dependence and evaluates how narrative context influences retrieval across these levels.

Findings

01

Concrete elements are stable across contexts.

02

Abstract elements shift with narrative context.

03

Injecting context on the image side improves retrieval.

Abstract

A scene of two people in the rain can convey hope and warmth in a reunion story or sorrow and finality in a farewell story. We investigate this context-dependent nature of image meaning and its implications for retrieval. Our key observation is that context dependency correlates with semantic abstraction: concrete elements (objects, actions) remain stable across contexts, while abstract elements (atmosphere, intent) shift with context. We operationalize this as the L1--L4 framework, organizing image semantics from context-independent (L1) to maximally context-dependent (L4). Using synthetic story contexts and queries for controlled evaluation, we examine how injecting narrative context into embeddings affects retrieval across abstraction levels. Concrete queries are retrievable without context, while abstract levels increasingly depend on narrative grounding. Where context is injected…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.