An Effective Data Creation Pipeline to Generate High-quality Financial Instruction Data for Large Language Model
Ziao Wang, Jianning Wang, Junda Wu, Xiaofeng Zhang

TL;DR
This paper introduces a data creation pipeline that uses AI-human collaboration to generate a high-quality, large-scale financial instruction dataset for fine-tuning large language models, improving their financial response capabilities.
Contribution
The paper presents a novel pipeline combining ChatGPT and human expert feedback to produce a large, high-quality financial dataset for model fine-tuning, which was not previously available.
Findings
Generated a 103k multi-turn chat dataset for financial tasks
Model responses improved in accuracy and relevance
External GPT-4 evaluation confirms effectiveness
Abstract
At the beginning era of large language model, it is quite critical to generate a high-quality financial dataset to fine-tune a large language model for financial related tasks. Thus, this paper presents a carefully designed data creation pipeline for this purpose. Particularly, we initiate a dialogue between an AI investor and financial expert using ChatGPT and incorporate the feedback of human financial experts, leading to the refinement of the dataset. This pipeline yielded a robust instruction tuning dataset comprised of 103k multi-turn chats. Extensive experiments have been conducted on this dataset to evaluate the model's performance by adopting an external GPT-4 as the judge. The promising experimental results verify that our approach led to significant advancements in generating accurate, relevant, and financial-style responses from AI models, and thus providing a powerful tool…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStock Market Forecasting Methods · Explainable Artificial Intelligence (XAI) · FinTech, Crowdfunding, Digital Finance
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Dense Connections · Label Smoothing · Residual Connection · Dropout · Absolute Position Encodings · Byte Pair Encoding
