Maximum-Entropy Regularized Decision Transformer with Reward Relabelling for Dynamic Recommendation
Xiaocong Chen, Siyu Wang, Lina Yao

TL;DR
This paper introduces EDT4Rec, a novel offline reinforcement learning method for recommendation systems that enhances exploration and trajectory stitching using maximum entropy principles and reward relabeling, outperforming existing approaches.
Contribution
The paper proposes a new Decision Transformer variant with max entropy exploration and reward relabeling to improve offline and online recommendation performance.
Findings
Outperforms existing methods on six real-world datasets
Enhances online exploration capabilities
Improves trajectory stitching in recommendation tasks
Abstract
Reinforcement learning-based recommender systems have recently gained popularity. However, due to the typical limitations of simulation environments (e.g., data inefficiency), most of the work cannot be broadly applied in all domains. To counter these challenges, recent advancements have leveraged offline reinforcement learning methods, notable for their data-driven approach utilizing offline datasets. A prominent example of this is the Decision Transformer. Despite its popularity, the Decision Transformer approach has inherent drawbacks, particularly evident in recommendation methods based on it. This paper identifies two key shortcomings in existing Decision Transformer-based methods: a lack of stitching capability and limited effectiveness in online adoption. In response, we introduce a novel methodology named Max-Entropy enhanced Decision Transformer with Reward Relabeling for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Advanced Bandit Algorithms Research · Recommender Systems and Techniques
MethodsSoftmax · Layer Normalization · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Attention Is All You Need · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention
