Same Image, Different Meanings: Toward Retrieval of Context-Dependent Meanings
Ayuto Tsutsumi, Ryosuke Kohita

TL;DR
This paper explores how image meanings vary with context, proposing a framework to improve retrieval by considering semantic abstraction levels and narrative grounding.
Contribution
It introduces the L1--L4 framework to organize image semantics by context dependence and evaluates how narrative context influences retrieval across these levels.
Findings
Concrete elements are stable across contexts.
Abstract elements shift with narrative context.
Injecting context on the image side improves retrieval.
Abstract
A scene of two people in the rain can convey hope and warmth in a reunion story or sorrow and finality in a farewell story. We investigate this context-dependent nature of image meaning and its implications for retrieval. Our key observation is that context dependency correlates with semantic abstraction: concrete elements (objects, actions) remain stable across contexts, while abstract elements (atmosphere, intent) shift with context. We operationalize this as the L1--L4 framework, organizing image semantics from context-independent (L1) to maximally context-dependent (L4). Using synthetic story contexts and queries for controlled evaluation, we examine how injecting narrative context into embeddings affects retrieval across abstraction levels. Concrete queries are retrievable without context, while abstract levels increasingly depend on narrative grounding. Where context is injected…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
