Chronologically Consistent Large Language Models

Songrun He; Linying Lv; Asaf Manela; Jimmy Wu

arXiv:2502.21206·q-fin.GN·July 8, 2025

Chronologically Consistent Large Language Models

Songrun He, Linying Lv, Asaf Manela, Jimmy Wu

PDF

Open Access

TL;DR

This paper introduces ChronoBERT and ChronoGPT, large language models trained with only data available at each time point, reducing lookahead bias and improving the credibility of social science and finance applications.

Contribution

The authors develop a novel training framework for chronologically consistent large language models that effectively mitigate lookahead bias in social science and finance tasks.

Findings

01

Models outperform or match standard models like BERT on NLP benchmarks.

02

Real-time outputs achieve Sharpe ratios comparable to larger models in finance.

03

Framework ensures more credible backtests and predictions in social sciences.

Abstract

Large language models are increasingly used in social sciences, but their training data can introduce lookahead bias and training leakage. A good chronologically consistent language model requires efficient use of training data to maintain accuracy despite time-restricted data. Here, we overcome this challenge by training a suite of chronologically consistent large language models, ChronoBERT and ChronoGPT, which incorporate only the text data that would have been available at each point in time. Despite this strict temporal constraint, our models achieve strong performance on natural language processing benchmarks, outperforming or matching widely used models (e.g., BERT), and remain competitive with larger open-weight models. Lookahead bias is model and application-specific because even if a chronologically consistent language model has poorer language comprehension, a regression or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsLookahead