Why Do Class-Dependent Evaluation Effects Occur with Time Series Feature Attributions? A Synthetic Data Investigation
Gregor Baer, Isel Grau, Chao Zhang, Pieter Van Gorp

TL;DR
This study investigates why class-dependent evaluation effects occur in time series feature attribution methods, revealing that perturbation-based metrics often contradict ground truth assessments and highlighting the need for more reliable evaluation approaches.
Contribution
The paper systematically analyzes the emergence of class-dependent evaluation effects in synthetic time series data, demonstrating discrepancies between evaluation metrics and ground truth.
Findings
Class-dependent effects occur even with simple, localized features.
Perturbation and ground truth metrics often give contradictory results.
Weak correlation exists between different attribution evaluation approaches.
Abstract
Evaluating feature attribution methods represents a critical challenge in explainable AI (XAI), as researchers typically rely on perturbation-based metrics when ground truth is unavailable. However, recent work reveals that these evaluation metrics can show different performance across predicted classes within the same dataset. These "class-dependent evaluation effects" raise questions about whether perturbation analysis reliably measures attribution quality, with direct implications for XAI method development and evaluation trustworthiness. We investigate under which conditions these class-dependent effects arise by conducting controlled experiments with synthetic time series data where ground truth feature locations are known. We systematically vary feature types and class contrasts across binary classification tasks, then compare perturbation-based degradation scores with ground…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
