The Construction of Instruction-tuned LLMs for Finance without Instruction Data Using Continual Pretraining and Model Merging
Masanori Hirano, Kentaro Imajo

TL;DR
This paper introduces a resource-efficient method to create instruction-tuned financial large language models by combining domain-specific continual pretraining with model merging, eliminating the need for instruction data.
Contribution
It presents a novel approach that merges existing instruction-tuned vectors with domain-specific pretrained vectors to construct financial LLMs without additional instruction data.
Findings
Successful construction of finance-specific instruction-tuned LLMs
Method leverages publicly available pretrained and instruction-tuned models
Produces effective financial LLMs without extra instruction data
Abstract
This paper proposes a novel method for constructing instruction-tuned large language models (LLMs) for finance without instruction data. Traditionally, developing such domain-specific LLMs has been resource-intensive, requiring a large dataset and significant computational power for continual pretraining and instruction tuning. Our study proposes a simpler approach that combines domain-specific continual pretraining with model merging. Given that general-purpose pretrained LLMs and their instruction-tuned LLMs are often publicly available, they can be leveraged to obtain the necessary instruction task vector. By merging this with a domain-specific pretrained vector, we can effectively create instruction-tuned LLMs for finance without additional instruction data. Our process involves two steps: first, we perform continual pretraining on financial data; second, we merge the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing · Multi-Agent Systems and Negotiation · Business Process Modeling and Analysis
