FinGPT: Democratizing Internet-scale Data for Financial Large Language Models
Xiao-Yang Liu, Guoxuan Wang, Hongyang Yang, Daochen Zha

TL;DR
This paper introduces FinGPT, an open-source framework that democratizes access to real-time financial data for training large language models, enabling innovative applications in finance through data curation, fine-tuning strategies, and low-cost customization.
Contribution
The paper presents FinGPT, a comprehensive, open-sourced data collection and fine-tuning framework for financial LLMs, addressing data scarcity and enabling accessible development of FinLLMs.
Findings
Successfully collected data from 34 sources for FinGPT
Demonstrated effective fine-tuning using RLSP with market feedback
Enabled applications like robo-advisors and sentiment analysis
Abstract
Large language models (LLMs) have demonstrated remarkable proficiency in understanding and generating human-like texts, which may potentially revolutionize the finance industry. However, existing LLMs often fall short in the financial field, which is mainly attributed to the disparities between general text data and financial text data. Unfortunately, there is only a limited number of financial text datasets available, and BloombergGPT, the first financial LLM (FinLLM), is close-sourced (only the training logs were released). In light of this, we aim to democratize Internet-scale financial data for LLMs, which is an open challenge due to diverse data sources, low signal-to-noise ratio, and high time-validity. To address the challenges, we introduce an open-sourced and data-centric framework, Financial Generative Pre-trained Transformer (FinGPT), that automates the collection and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStock Market Forecasting Methods · FinTech, Crowdfunding, Digital Finance · Energy Load and Power Forecasting
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Label Smoothing · Softmax · Dense Connections · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Residual Connection
