The Data Efficiency Frontier of Financial Foundation Models: Scaling Laws from Continued Pretraining
Jesse Ponnock

TL;DR
This paper investigates how continued pretraining on financial data improves large language models' domain-specific performance, revealing efficient learning patterns and minimal forgetting, guiding future scaling of financial foundation models.
Contribution
It provides the first empirical scaling-law analysis of financial domain adaptation for large language models, demonstrating effective specialization with modest data and minimal domain drift.
Findings
Significant performance gains within first 200M tokens of pretraining.
Financial language is highly regular and learnable with shallow power-law exponents.
Minimal catastrophic forgetting observed across all token budgets.
Abstract
Domain-adaptive pretraining (DAPT) offers a practical path to specializing large language models for high-value domains without full retraining. We conduct an early-stage scaling-law analysis of continued pretraining on U.S. SEC filings, training 1B and 3B-parameter Llama-3.2 models on a 400M-token financial corpus with validation checkpoints at 50M, 100M, 200M, and 400M tokens. Results show consistent improvements in SEC-domain validation loss for both models, with the largest gains occurring within the first 200M tokens and diminishing returns thereafter. Power-law fits reveal shallow exponents, indicating that financial language is highly regular and efficiently learnable under continued pretraining. General-domain validation loss remains effectively unchanged across all token budgets, suggesting minimal drift and no signs of catastrophic forgetting. A data-efficiency frontier…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Financial Distress and Bankruptcy Prediction · Domain Adaptation and Few-Shot Learning
