TimeSAE: Sparse Decoding for Faithful Explanations of Black-Box Time Series Models

Khalid Oublal; Quentin Bouniot; Qi Gan; Stephan Cl\'emen\c{c}on; Zeynep Akata

arXiv:2601.09776·cs.LG·January 16, 2026

TimeSAE: Sparse Decoding for Faithful Explanations of Black-Box Time Series Models

Khalid Oublal, Quentin Bouniot, Qi Gan, Stephan Cl\'emen\c{c}on, Zeynep Akata

PDF

Open Access 3 Reviews

TL;DR

TimeSAE introduces a novel sparse autoencoder framework for explaining black-box time series models, emphasizing robustness to distributional shifts and providing faithful interpretations in high-stakes applications.

Contribution

The paper proposes TimeSAE, a new explanation framework combining sparse autoencoders and causality, improving robustness and faithfulness over existing methods for time series model explanations.

Findings

01

TimeSAE outperforms baselines in faithfulness and robustness.

02

It provides explanations that generalize beyond training distribution.

03

Extensive evaluations on synthetic and real datasets validate its effectiveness.

Abstract

As black box models and pretrained models gain traction in time series applications, understanding and explaining their predictions becomes increasingly vital, especially in high-stakes domains where interpretability and trust are essential. However, most of the existing methods involve only in-distribution explanation, and do not generalize outside the training support, which requires the learning capability of generalization. In this work, we aim to provide a framework to explain black-box models for time series data through the dual lenses of Sparse Autoencoders (SAEs) and causality. We show that many current explanation methods are sensitive to distributional shifts, limiting their effectiveness in real-world scenarios. Building on the concept of Sparse Autoencoder, we introduce TimeSAE, a framework for black-box model explanation. We conduct extensive evaluations of TimeSAE on both…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 6Confidence 3

Strengths

The integration of causal concept effects with a Sparse Autoencoder framework is a novel and promising approach for generating explanations. The proposed compositional consistency loss is a thoughtful addition to address the challenge of out-of-distribution generalization, enhancing the model's robustness. The paper is supported by extensive experiments on both synthetic and real-world datasets, which effectively demonstrate the performance of the proposed method against relevant baselines.

Weaknesses

1. **Interpretation of Theoretical Results**: While the inclusion of a theorem on order-faithfulness is appreciated, the paper would benefit from a more in-depth discussion of its practical implications and its significance within the broader literature on explainability. 2. **Rationale for Multiple Sparsity Mechanisms**: The paper employs multiple forms of sparsity (e.g., row-wise normalization, an SAE loss term, and Bernoulli masks). The motivation for including each of these specific mechani

Reviewer 02Rating 4Confidence 5

Strengths

The paper makes several combined contributions in the field of time series XAI: * Targeted solution to core time series XAI challenges. Even though previous work has been mentioned, TimeSAE directly addresses three critical pain points: OOD generalization, causal faithfulness, and dead concepts. These design choices are well-motivated by gaps in prior work and align with real-world needs for robust explanations. * The inclusion of Theorem 1 (proving order-preserving causal faithfulness) enhances

Weaknesses

* The paper successfully demonstrates that its learned concepts are useful for generating faithful explanations at a quantitative level. However, it is less clear how a human user would interpret these learned concepts. Section 3.4 mentions CARs and a functional ANOVA decomposition for interpretation, but these feel somewhat disconnected from the main method? * Hyperparameter Sensitivity and Complexity: The final objective function (Eq. 10) is a weighted sum of four different loss terms, involv

Reviewer 03Rating 4Confidence 3

Strengths

1. This paper provides a novel framework that integrates with Sparse Autoencoders with causal counterfactuals for time series. TimeSAE uniquely combines sparse concept learning with counterfactual generation via interventions on latent concepts. Couples sparsity with a contrastive counterfactual loss that explicitly encourages CaCE order preservation. 2. This paper conducts comprehensive empirical validation across settings. Evaluation spans synthetic (FreqShapes, SeqComb-UV) and real-world d

Weaknesses

1. The theoretical assumptions and equations have some issues. - The mathematical symbols used in this article are very confusing and difficult to read. For example, $g$ is the decoder and the generator at the same time. The $\mathcal{E}$ is the encoder, explainer, and counterfactual explanation in Eq. 4. $\tilde{x}$ is the input in Theorem 1 but embedding in Eq.4. - Theorem 1 assumes a small approximation error $\epsilon_{cf}$ (Eq. (5)), but no experiment quantifies this error or validates i

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Generative Adversarial Networks and Image Synthesis · Machine Learning in Healthcare