Beyond the Known: Decision Making with Counterfactual Reasoning Decision Transformer
Minh Hoang Nguyen, Linh Le Pham Van, Thommen George Karimpanal, Sunil Gupta, Hung Le

TL;DR
This paper introduces CRDT, a novel decision transformer framework that uses counterfactual reasoning to improve decision-making and generalization in reinforcement learning, especially with limited or suboptimal data.
Contribution
CRDT is the first to incorporate counterfactual reasoning into decision transformers, enabling reasoning beyond known data without architectural changes.
Findings
CRDT outperforms traditional decision transformers on Atari and D4RL benchmarks.
CRDT demonstrates improved performance with limited and altered data scenarios.
CRDT enables stitching of suboptimal trajectories through counterfactual reasoning.
Abstract
Decision Transformers (DT) play a crucial role in modern reinforcement learning, leveraging offline datasets to achieve impressive results across various domains. However, DT requires high-quality, comprehensive data to perform optimally. In real-world applications, the lack of training data and the scarcity of optimal behaviours make training on offline datasets challenging, as suboptimal data can hinder performance. To address this, we propose the Counterfactual Reasoning Decision Transformer (CRDT), a novel framework inspired by counterfactual reasoning. CRDT enhances DT ability to reason beyond known data by generating and utilizing counterfactual experiences, enabling improved decision-making in unseen scenarios. Experiments across Atari and D4RL benchmarks, including scenarios with limited data and altered dynamics, demonstrate that CRDT outperforms conventional DT approaches.…
Peer Reviews
Decision·Submitted to ICLR 2025
Novel Concept: To the best of my knowledge, this is the first attempt to incorporate counterfactual reasoning into DT training, which adds a novel approach to decision transformers.
- Weak Connection to Causal Inference: Although inspired by causality, the method lacks a clear link to causal inference concepts, something like causal effects. The approach mainly introduces how to generate uncertain trajectories and incorporate them into DT training. - Marginal Novelty: The proposed method does not make a strong theoretical contribution beyond its counterfactual data generation process. And the proposed method is not novel. - Ground-Truth Validity of Counterfactual Traject
1. The paper identifies the limitation of DT that DTs can underperform when optimal trajectories are scarce or data is biased toward suboptimal trajectories. 2. CRDT enables the agent to reason beyond known data by generating counterfactual experiences by integrating the causal inference framework, particularly the potential outcomes approach, with reinforcement learning. CRDT shows better performance on standard benchmarks as compare to traditional methods and DT.
1. The CRDT framework introduces several new hyperparameters (e.g., number of counterfactual actions, uncertainty threshold, and the number of experiences), which may require extensive tuning for different environments, potentially hindering practical applicability. 2. It is unclear to me how the treatment and outcome models are utilized during inference at test time, as the sections before the experiment sections focus primarily on how to train them. Providing detailed explanations of the infer
- The CRDT framework aims to address the limitations of the stitching ability of DT by leveraging counterfactual reasoning to generate and filter the counterfactual trajectories without changing the DT's architecture. - The methodology is inspired by two criteria: high accumulated return and high prediction confidence, which ensures the generated counterfactual trajectories are meaningful and beneficial for the DT's training. - The empirical results show that the proposed method outperforms the
- The paper provides a detailed explanation of the CRDT framework but lacks theoretical analysis of the counterfactual reasoning and its impact on the DT's performance. - The implementation of the CRDT framework involves training two separate transformer models (Treatment and Outcome models), which increases computational consumption and training costs. - Although this method outperforms existing baseline methods in the average score in the D4RL benchmark and Atari games, the improvements are n
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Business Intelligence
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Dropout · Layer Normalization · Byte Pair Encoding · Softmax · Absolute Position Encodings · Residual Connection
