Perceptual Flow Network for Visually Grounded Reasoning
Yangfu Li, Yuning Gong, Hongjian Zhan, Teng Li, Yuanhuiyi Lyu, Tianyi Chen, Qi Liu, Ziyuan Huang, Zhihang Zhong, Dandan Zheng, Yue Lu

TL;DR
PFlowNet is a novel perceptual flow network that improves visual reasoning in LVLMs by decoupling perception from reasoning and using reinforcement learning, achieving state-of-the-art results.
Contribution
It introduces a self-conditioned generation process and a reinforcement learning framework to enhance visual reasoning without relying on rigid geometric priors.
Findings
Sets new SOTA on V* Bench with 90.6%
Achieves 67.0% on MME-RealWorld-lite
Provides a performance guarantee for visual reasoning
Abstract
Despite the success of Large-Vision Language Models (LVLMs), general optimization objectives (e.g., standard MLE) fail to constrain visual trajectories, leading to language bias and hallucination. To mitigate this, current methods introduce geometric priors from visual experts as additional supervision. However, we observe that such supervision is typically suboptimal: it is biased toward geometric precision and offers limited reasoning utility. To bridge this gap, we propose Perceptual Flow Network (PFlowNet), which eschews rigid alignment with the expert priors and achieves interpretable yet more effective visual reasoning. Specifically, PFlowNet decouples perception from reasoning to establish a self-conditioned generation process. Based on this, it integrates multi-dimensional rewards with vicinal geometric shaping via variational reinforcement learning, thereby facilitating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
