Enhancing Visual Programming for Visual Reasoning via Probabilistic Graphs
Wentao Wan, Kaiyu Wu, Qingyang Ma, Nan Kang, Yunjie Chen, Liang Lin, Keze Wang

TL;DR
This paper introduces EVPG, a novel method that transforms the non-differentiable visual programming process into a differentiable probabilistic inference, enabling end-to-end training and significantly improving visual reasoning performance.
Contribution
The paper proposes a probabilistic graph-based approach to optimize visual programming frameworks for visual reasoning, allowing gradient-based learning using only final task labels.
Findings
Significant performance improvements on GQA, NLVRv2, and Open Images datasets.
Effective transformation of VP into a differentiable process via probabilistic graphs.
Enhanced end-to-end training capability for VP in complex VR tasks.
Abstract
Recently, Visual Programming (VP) based on large language models (LLMs) has rapidly developed and demonstrated significant potential in complex Visual Reasoning (VR) tasks. Previous works to enhance VP have primarily focused on improving the quality of LLM-generated visual programs. However, they have neglected to optimize the VP-invoked pre-trained models, which serve as modules for the visual sub-tasks decomposed from the targeted tasks by VP. The difficulty is that there are only final labels of targeted VR tasks rather than labels of sub-tasks. Besides, the non-differentiable nature of VP impedes the direct use of efficient gradient-based optimization methods to leverage final labels for end-to-end learning of the entire VP framework. To overcome these issues, we propose EVPG, a method to Enhance Visual Programming for visual reasoning via Probabilistic Graphs. Specifically, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Graph Neural Networks · Data Visualization and Analytics
