TL;DR
This paper introduces three new metrics to better understand how models adapt to changing data distributions over time, addressing limitations of existing performance decline measures.
Contribution
The work proposes novel metrics that differentiate between model adaptation and data difficulty during temporal distribution shifts.
Findings
Metrics reveal hidden adaptation patterns in models.
Existing metrics often misinterpret performance decline.
New metrics provide a clearer view of temporal robustness.
Abstract
Evaluating robustness under temporal distribution shift remains an open challenge. Existing metrics quantify the average decline in performance, but fail to capture how models adapt to evolving data. As a result, temporal degradation is often misinterpreted: when accuracy declines, it is unclear whether the model is failing to adapt or whether the data itself has become inherently more challenging to learn. In this work, we propose three complementary metrics to distinguish adaptation from intrinsic difficulty in the data. Together, these metrics provide a dynamic and interpretable view of model behavior under temporal distribution shift. Results show that our metrics uncover adaptation patterns hidden by existing analysis, offering a richer understanding of temporal robustness in evolving environments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
