Training and Evaluating Causal Forecasting Models for Time-Series

Thomas Crasson; Yacine Nabet; Mathias L\'ecuyer

arXiv:2411.00126·cs.LG·November 4, 2024

Training and Evaluating Causal Forecasting Models for Time-Series

Thomas Crasson, Yacine Nabet, Mathias L\'ecuyer

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a new framework for training causal time-series models that better generalize to out-of-distribution scenarios, using economic-inspired evaluation methods.

Contribution

It extends the orthogonal statistical learning framework to causal forecasting, enabling models to generalize beyond their training distribution.

Findings

01

Improved causal forecasting accuracy outside training distribution

02

Effective evaluation using Regression Discontinuity Designs

03

Enhanced model robustness for decision-making applications

Abstract

Deep learning time-series models are often used to make forecasts that inform downstream decisions. Since these decisions can differ from those in the training set, there is an implicit requirement that time-series models will generalize outside of their training distribution. Despite this core requirement, time-series models are typically trained and evaluated on in-distribution predictive tasks. We extend the orthogonal statistical learning framework to train causal time-series models that generalize better when forecasting the effect of actions outside of their training distribution. To evaluate these models, we leverage Regression Discontinuity Designs popular in economics to construct a test set of causal treatment effects.

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 6Confidence 4

Strengths

It is great to see that the authors focus on both theoretical results and using empirical investigations (for 2 different cases) to explore their approach. It is mostly an extension to existing ideas and methods (e.g., orthogonal learning, for time-series), but still in a way that makes it very valuable since time series learning and forecasting is a very broad field for which any improvement can make a high impact.

Weaknesses

Even though the paper is well written, it is also very compact overall, and difficult to read at stages. Already the motivations could be better developed, as well as the justification for orthogonal learning. Personally, I felt quite frustrated with the readability of the paper. Besides quite a number of points that could be improved in terms of presentation (citation style for references and equations), the flow of the paper is difficult to follow at stages. This is possibly due to page limita

Reviewer 02Rating 6Confidence 3

Strengths

While many architectures have been proposed for time series forecasting in general, causal temporal models have been relatively understudied by comparison. The paper is also well motivated and clearly presented — with the use of orthogonal learning being an interesting approach, and use of the TFT being sensible to incorporate a wider range of inputs. The outline of the RDD evaluation approach is also useful, providing a way to evaluate causal temporal models on live observational data.

Weaknesses

However, the performance improvements do appear slim, with the causal TFT underperforming the causal transformer on both on all in distribution time-steps and 2/5 of the RDD forecast horizons. Where improvements exist, it is also not immediately clear if they are attributable to the improved loss function or the use of the TFT — with the novelty of the former far out-weighing the latter.

Reviewer 03Rating 6Confidence 3

Strengths

1. This work has real-world applications and has the potential to improve decision-making processes that depend on forecasts from time series models. 2. The two extensions to adapt the orthogonal statistical learning framework to train causal models—defining daily treatment effects and extending the binary treatment effect data model to categorical and linear treatments—are mathematically justified. 3. Despite the complex approach and new terminologies introduced, the paper is well-structured an

Weaknesses

1. T_t for treatment at time t, and T_1, T_2 for treatment types can be confusing. 2. The evaluation method using RDD is complex and may generate only a few test set examples.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsForecasting Techniques and Applications