FinLLMs: A Framework for Financial Reasoning Dataset Generation with Large Language Models
Ziqiang Yuan, Kaiyuan Wang, Shoutai Zhu, Ye Yuan, Jingya Zhou, Yanlin, Zhu, Wenqi Wei

TL;DR
FinLLMs introduces a novel framework that leverages large language models to automatically generate financial question-answering datasets, reducing manual annotation costs and improving model performance in financial reasoning tasks.
Contribution
The paper presents a new method for generating financial QA data using LLMs based on financial formulas and variable graphs, enhancing data resources efficiently.
Findings
Synthetic data improves financial reasoning model performance.
FinLLMs outperforms existing benchmark datasets.
Graph-based formula augmentation enhances data diversity.
Abstract
Large Language models (LLMs) usually rely on extensive training datasets. In the financial domain, creating numerical reasoning datasets that include a mix of tables and long text often involves substantial manual annotation expenses. To address the limited data resources and reduce the annotation cost, we introduce FinLLMs, a method for generating financial question-answering data based on common financial formulas using Large Language Models. First, we compile a list of common financial formulas and construct a graph based on the variables these formulas employ. We then augment the formula set by combining those that share identical variables as new elements. Specifically, we explore formulas obtained by manual annotation and merge those formulas with shared variables by traversing the constructed graph. Finally, utilizing GPT-3.5, we generate financial question-answering data that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsSparse Evolutionary Training · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Adam · Residual Connection · Linear Layer · Multi-Head Attention · Byte Pair Encoding · Softmax
