Time Machine GPT
Felix Drinkall, Eghbal Rahimikia, Janet B. Pierrehumbert, Stefan, Zohren

TL;DR
This paper introduces Time Machine GPT, a series of point-in-time language models designed to be uninformed about future data, aiding in understanding language evolution and dynamic applications without future knowledge bias.
Contribution
The paper proposes a novel approach to creating temporally specific language models that are nonprognosticative, unlike traditional models that rely on static pre-training.
Findings
Models are uninformed about future data
Facilitates studying language evolution
Useful for dynamic, time-sensitive applications
Abstract
Large language models (LLMs) are often trained on extensive, temporally indiscriminate text corpora, reflecting the lack of datasets with temporal metadata. This approach is not aligned with the evolving nature of language. Conventional methods for creating temporally adapted language models often depend on further pre-training static models on time-specific data. This paper presents a new approach: a series of point-in-time LLMs called Time Machine GPT (TiMaGPT), specifically designed to be nonprognosticative. This ensures they remain uninformed about future factual information and linguistic changes. This strategy is beneficial for understanding language evolution and is of critical importance when applying models in dynamic contexts, such as time-series forecasting, where foresight of future information can prove problematic. We provide access to both the models and training datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDistributed and Parallel Computing Systems · Advanced Database Systems and Queries · Numerical Methods and Algorithms
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Linear Layer · Linear Warmup With Cosine Annealing · Dense Connections · Adam · Layer Normalization · Attention Dropout · Multi-Head Attention
