Mitigating Coordinate Prediction Bias from Positional Encoding Failures

Xingjian Tao; Yiwei Wang; Yujun Cai; Yihong Luo; Kai Han; Jing Tang

arXiv:2510.22102·cs.CV·April 29, 2026

Mitigating Coordinate Prediction Bias from Positional Encoding Failures

Xingjian Tao, Yiwei Wang, Yujun Cai, Yihong Luo, Kai Han, Jing Tang

PDF

1 Repo

TL;DR

This paper introduces VPSG, a training-free inference correction method that mitigates coordinate prediction bias caused by positional encoding failures in multimodal models, improving localization accuracy.

Contribution

The paper presents VPSG, a novel inference-time correction technique that addresses positional encoding failures without retraining, enhancing coordinate prediction in vision-language models.

Findings

01

VPSG effectively corrects coordinate drift in models.

02

VPSG improves localization accuracy across various model scales.

03

The method does not require retraining or fine-tuning.

Abstract

While Multimodal Large Language Models (MLLMs) excel at general vision-language tasks, precise coordinate prediction remains a significant challenge, particularly as high-resolution inputs cause visual positional encodings (VPEs) to degrade. We demonstrate that these encoding failures do not result in random noise but instead trigger predictable, directional biases, suggesting that models default to internal spatial priors when grounding signals are weak. To counteract this, we introduce Vision-PE Shuffle Guidance (VPSG), a training-free, inference-time correction method. VPSG isolates position-unconditioned tendencies by shuffling VPEs and utilizes this negative evidence to steer digit decoding through a lightweight finite-state machine. Evaluation on the ScreenSpot-Pro benchmark confirms that VPSG effectively rectifies coordinate drift, yielding consistent improvements in localization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

taoxj2001/VPSG
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.