Right on Time: Revising Time Series Models by Constraining their Explanations

Maurice Kraus; David Steinmann; Antonia W\"ust; Andre Kokozinski; Kristian Kersting

arXiv:2402.12921·cs.LG·June 16, 2025·1 cites

Right on Time: Revising Time Series Models by Constraining their Explanations

Maurice Kraus, David Steinmann, Antonia W\"ust, Andre Kokozinski, Kristian Kersting

PDF

Open Access 1 Repo 1 Datasets 3 Reviews

TL;DR

This paper introduces RioT, a method that improves the reliability of deep time series models by constraining their explanations in both time and frequency domains to avoid reliance on spurious correlations.

Contribution

RioT is a novel approach that interacts with model explanations across time and frequency domains to reduce shortcuts in time series models.

Findings

01

RioT effectively guides models toward more reliable decisions.

02

Demonstrated on multiple datasets including real-world production line data.

03

Improves model robustness against spurious correlations.

Abstract

Deep time series models often suffer from reliability issues due to their tendency to rely on spurious correlations, leading to incorrect predictions. To mitigate such shortcuts and prevent "Clever-Hans" moments in time series models, we introduce Right on Time (RioT), a novel method that enables interacting with model explanations across both the time and frequency domains. By incorporating feedback on explanations in both domains, RioT constrains the model, steering it away from annotated spurious correlations. This dual-domain interaction strategy is crucial for effectively addressing shortcuts in time series datasets. We empirically demonstrate the effectiveness of RioT in guiding models toward more reliable decision-making across popular time series classification and forecasting datasets, as well as our newly recorded dataset with naturally occuring shortcuts, P2S, collected from…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 6Confidence 3

Strengths

## Originality - The application of RRR methods to time series data is novel. - Introducing two RRR loss terms for spatial and time domains is novel. - The P2S dataset is a new contribution, featuring confounder annotations that can be utilized for future method evaluations. - The unique approach to adding confounders to the UCR/UEA datasets is original; this dataset can serve a similar purpose to DomainBed in image analysis (https://github.com/facebookresearch/DomainBed) for future model evalu

Weaknesses

## Novelty: - RioT itself is not novel, it is just applying RRR to time series data. ## Clarity: - Some minor clarity issues; please see the question section.

Reviewer 02Rating 6Confidence 4

Strengths

1. High practical relevance: The focus on confounders in industrial sensor data highlights the method's real-world applicability. 2. Novel dual-domain feedback: Operating across both time and frequency domains is an innovative extension to existing XIL paradigms. 3. Empirical rigor: The experiments span across multiple datasets, models, and configurations, adding robustness to the findings.

Weaknesses

1. Dependency on human feedback: The method requires expert annotations, which may be costly and challenging to acquire at scale. 2. Training cost: Incorporating feedback through gradient-based explanations adds computational overhead. 3. Limited exploration of feedback noise impact: Although robustness tests are performed, more exploration of noisy annotations could be useful to assess real-world viability.

Reviewer 03Rating 5Confidence 4

Strengths

1. The paper is well structured. 2. The paper is clearly written. 3. The authors conduct various experiments including different model architectures and downstream applications to evaluate their method. 4. The authors provide their code to support reproducibility.

Weaknesses

1. Table 1 to Table 4 as well as the statement "we evaluate on [...] popular datasets [...] with 70% / 30% train / test splits" (ll. 268-269) indicate that the authors did not use a validation set, i.e. tuned their models on the test set. The authors should follow good practice and adhere to established procedures, e.g. as introduced by [1] for forecasting (train/val/test split of 60/20/20 for ETTm1, and 70/10/20 for Energy and Weather). 2. The proposed approach requires human feedback for eac

Code & Models

Repositories

ml-research/riot
pytorchOfficial

Datasets

AIML-TUDA/P2S
dataset· 8 dl
8 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTime Series Analysis and Forecasting