Maximum-Entropy Regularized Decision Transformer with Reward Relabelling   for Dynamic Recommendation

Xiaocong Chen; Siyu Wang; Lina Yao

arXiv:2406.00725·cs.IR·June 4, 2024

Maximum-Entropy Regularized Decision Transformer with Reward Relabelling for Dynamic Recommendation

Xiaocong Chen, Siyu Wang, Lina Yao

PDF

Open Access

TL;DR

This paper introduces EDT4Rec, a novel offline reinforcement learning method for recommendation systems that enhances exploration and trajectory stitching using maximum entropy principles and reward relabeling, outperforming existing approaches.

Contribution

The paper proposes a new Decision Transformer variant with max entropy exploration and reward relabeling to improve offline and online recommendation performance.

Findings

01

Outperforms existing methods on six real-world datasets

02

Enhances online exploration capabilities

03

Improves trajectory stitching in recommendation tasks

Abstract

Reinforcement learning-based recommender systems have recently gained popularity. However, due to the typical limitations of simulation environments (e.g., data inefficiency), most of the work cannot be broadly applied in all domains. To counter these challenges, recent advancements have leveraged offline reinforcement learning methods, notable for their data-driven approach utilizing offline datasets. A prominent example of this is the Decision Transformer. Despite its popularity, the Decision Transformer approach has inherent drawbacks, particularly evident in recommendation methods based on it. This paper identifies two key shortcomings in existing Decision Transformer-based methods: a lack of stitching capability and limited effectiveness in online adoption. In response, we introduce a novel methodology named Max-Entropy enhanced Decision Transformer with Reward Relabeling for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Advanced Bandit Algorithms Research · Recommender Systems and Techniques

MethodsSoftmax · Layer Normalization · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Attention Is All You Need · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention