Context-Dependent Affordance Computation in Vision-Language Models

Murad Farzulla

arXiv:2603.04419·cs.CL·March 6, 2026

Context-Dependent Affordance Computation in Vision-Language Models

Murad Farzulla

PDF

Open Access

TL;DR

This paper demonstrates that vision-language models exhibit significant context-dependent variability in affordance computation, affecting both lexical and semantic representations, which has implications for dynamic world modeling in robotics.

Contribution

It provides the first large-scale quantitative analysis of context-dependent affordance drift in VLMs, revealing stable latent factors and emphasizing the importance of context in affordance understanding.

Findings

01

Over 90% lexical scene description is context-dependent.

02

Semantic similarity shows 58.5% context dependence.

03

Stable latent factors include a 'Culinary Manifold' and an 'Access Axis'.

Abstract

We characterize the phenomenon of context-dependent affordance computation in vision-language models (VLMs). Through a large-scale computational study (n=3,213 scene-context pairs from COCO-2017) using Qwen-VL 30B and LLaVA-1.5-13B subject to systematic context priming across 7 agentic personas, we demonstrate massive affordance drift: mean Jaccard similarity between context conditions is 0.095 (95% CI: [0.093, 0.096], p < 0.0001), indicating that >90% of lexical scene description is context-dependent. Sentence-level cosine similarity confirms substantial drift at the semantic level (mean = 0.415, 58.5% context-dependent). Stochastic baseline experiments (2,384 inference runs across 4 temperatures and 5 seeds) confirm this drift reflects genuine context effects rather than generation noise: within-prime variance is substantially lower than cross-prime variance across all conditions.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Autonomous Vehicle Technology and Safety