Synthetic Data-Driven Prompt Tuning for Financial QA over Tables and Documents
Yaoning Yu, Kai-Min Chang, Ye Yu, Kai Wei, Haojing Luo, Haohan Wang

TL;DR
This paper presents a self-improving, synthetic data-driven prompt tuning framework that enhances financial reasoning capabilities of large language models on tables and documents without external labels.
Contribution
It introduces a closed-loop system combining synthetic data generation, verification, and prompt optimization to improve financial question-answering performance.
Findings
Achieves higher accuracy on DocMath-Eval benchmark
Improves robustness of financial reasoning prompts
Reduces reliance on manually labeled datasets
Abstract
Financial documents like earning reports or balance sheets often involve long tables and multi-page reports. Large language models have become a new tool to help numerical reasoning and understanding these documents. However, prompt quality can have a major effect on how well LLMs perform these financial reasoning tasks. Most current methods tune prompts on fixed datasets of financial text or tabular data, which limits their ability to adapt to new question types or document structures, or they involve costly and manually labeled/curated dataset to help build the prompts. We introduce a self-improving prompt framework driven by data-augmented optimization. In this closed-loop process, we generate synthetic financial tables and document excerpts, verify their correctness and robustness, and then update the prompt based on the results. Specifically, our framework combines a synthetic data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Stock Market Forecasting Methods · Advanced Text Analysis Techniques
