TL;DR
This paper investigates semantic fixation in large vision-language models, showing that they often stick to default interpretations, and proposes interventions to understand and mitigate this bias.
Contribution
The study introduces VLM-Fix, a benchmark for evaluating semantic fixation, and demonstrates how prompt and training interventions can influence model interpretation biases.
Findings
Models favor standard rules over inverse rules, revealing semantic fixation.
Prompt aliases can reduce or reopen the semantic fixation gap.
Late-layer activation steering can partially recover model performance.
Abstract
Large vision-language models (VLMs) often rely on familiar semantic priors, but existing evaluations do not cleanly separate perception failures from rule-mapping failures. We study this behavior as semantic fixation: preserving a default interpretation even when the prompt specifies an alternative, equally valid mapping. To isolate this effect, we introduce VLM-Fix, a controlled benchmark over four abstract strategy games that evaluates identical terminal board states under paired standard and inverse rule formulations. Across 14 open and closed VLMs, accuracy consistently favors standard rules, revealing a robust semantic-fixation gap. Prompt interventions support this mechanism: neutral alias prompts substantially narrow the inverse-rule gap, while semantically loaded aliases reopen it. Post-training is strongly rule-aligned: training on one rule improves same-rule transfer but hurts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
