The Data Efficiency Frontier of Financial Foundation Models: Scaling Laws from Continued Pretraining

Jesse Ponnock

arXiv:2512.12384·cs.LG·December 16, 2025

The Data Efficiency Frontier of Financial Foundation Models: Scaling Laws from Continued Pretraining

Jesse Ponnock

PDF

Open Access

TL;DR

This paper investigates how continued pretraining on financial data improves large language models' domain-specific performance, revealing efficient learning patterns and minimal forgetting, guiding future scaling of financial foundation models.

Contribution

It provides the first empirical scaling-law analysis of financial domain adaptation for large language models, demonstrating effective specialization with modest data and minimal domain drift.

Findings

01

Significant performance gains within first 200M tokens of pretraining.

02

Financial language is highly regular and learnable with shallow power-law exponents.

03

Minimal catastrophic forgetting observed across all token budgets.

Abstract

Domain-adaptive pretraining (DAPT) offers a practical path to specializing large language models for high-value domains without full retraining. We conduct an early-stage scaling-law analysis of continued pretraining on U.S. SEC filings, training 1B and 3B-parameter Llama-3.2 models on a 400M-token financial corpus with validation checkpoints at 50M, 100M, 200M, and 400M tokens. Results show consistent improvements in SEC-domain validation loss for both models, with the largest gains occurring within the first 200M tokens and diminishing returns thereafter. Power-law fits reveal shallow exponents, indicating that financial language is highly regular and efficiently learnable under continued pretraining. General-domain validation loss remains effectively unchanged across all token budgets, suggesting minimal drift and no signs of catastrophic forgetting. A data-efficiency frontier…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Financial Distress and Bankruptcy Prediction · Domain Adaptation and Few-Shot Learning