PyFi: Toward Pyramid-like Financial Image Understanding for VLMs via Adversarial Agents

Yuqun Zhang; Yuxuan Zhao; Sijia Chen

arXiv:2512.14735·q-fin.CP·April 9, 2026

PyFi: Toward Pyramid-like Financial Image Understanding for VLMs via Adversarial Agents

Yuqun Zhang, Yuxuan Zhao, Sijia Chen

PDF

1 Repo

TL;DR

PyFi introduces a pyramid-structured dataset and adversarial framework for training vision language models to perform complex financial visual reasoning through progressive question chains.

Contribution

The paper presents PyFi, a scalable, synthesized dataset and adversarial training method enabling VLMs to reason through financial images in a hierarchical manner.

Findings

01

Fine-tuning improves model accuracy by up to 19.52%.

02

PyFi-600K dataset enables detailed evaluation of financial visual reasoning.

03

Adversarial question chains facilitate progressive reasoning capabilities.

Abstract

This paper proposes PyFi, a novel framework for pyramid-like financial image understanding that enables vision language models (VLMs) to reason through question chains in a progressive, simple-to-complex manner. At the core of PyFi is PyFi-600K, a dataset comprising 600K financial question-answer pairs organized into a reasoning pyramid: questions at the base require only basic perception, while those toward the apex demand increasing levels of capability in financial visual understanding and expertise. This data is scalable because it is synthesized without human annotations, using PyFi-adv, a multi-agent adversarial mechanism under the Monte Carlo Tree Search (MCTS) paradigm, in which, for each image, a challenger agent competes with a solver agent by generating question chains that progressively probe deeper capability levels in financial visual reasoning. Leveraging this dataset, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AgenticFinLab/PyFi
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.