# Generalization vs. Memorization in Autoregressive Deep Learning: Or, Examining Temporal Decay of Gradient Coherence

**Authors:** James Amarel, Nicolas Hengartner, Robyn Miller, Kamaljeet Singh, Siddharth Mansingh, Arvind Mohan, Benjamin Migliori, Emily Casleton, Alexei Skurikhin, Earl Lawrence, Gerd J. Kunde

arXiv: 2509.00024 · 2026-01-21

## TL;DR

This paper investigates the balance between generalization and memorization in autoregressive deep learning models for PDE surrogates, revealing limitations and guiding improved model design for scientific discovery.

## Contribution

It introduces an influence function-based framework to analyze how these models assimilate information, exposing fundamental limitations and offering insights for better surrogate design.

## Key findings

- Standard models show limited generalization beyond training data
- Influence functions reveal how information propagates in models
- Insights lead to improved surrogate training strategies

## Abstract

Foundation models trained as autoregressive PDE surrogates hold significant promise for accelerating scientific discovery through their capacity to both extrapolate beyond training regimes and efficiently adapt to downstream tasks despite a paucity of examples for fine-tuning. However, reliably achieving genuine generalization - a necessary capability for producing novel scientific insights and robustly performing during deployment - remains a critical challenge. Establishing whether or not these requirements are met demands evaluation metrics capable of clearly distinguishing genuine model generalization from mere memorization.   We apply the influence function formalism to systematically characterize how autoregressive PDE surrogates assimilate and propagate information derived from diverse physical scenarios, revealing fundamental limitations of standard models and training routines in addition to providing actionable insights regarding the design of improved surrogates.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2509.00024/full.md

## Figures

39 figures with captions in the complete paper: https://tomesphere.com/paper/2509.00024/full.md

## References

60 references — full list in the complete paper: https://tomesphere.com/paper/2509.00024/full.md

---
Source: https://tomesphere.com/paper/2509.00024