Learning to Play Text-based Adventure Games with Maximum Entropy Reinforcement Learning
Weichen Li, Rati Devidze, Sophie Fellenz

TL;DR
This paper adapts the soft-actor-critic reinforcement learning algorithm to text-based adventure games, improving training stability and performance by using reward shaping to handle sparse rewards, leading to faster learning and higher scores.
Contribution
It introduces the application of SAC to text-based games and combines it with potential-based reward shaping to enhance learning efficiency and effectiveness.
Findings
SAC outperforms Q-learning on many text-based games.
Reward shaping accelerates policy learning and improves scores.
The method achieves higher scores with fewer training steps.
Abstract
Text-based games are a popular testbed for language-based reinforcement learning (RL). In previous work, deep Q-learning is commonly used as the learning agent. Q-learning algorithms are challenging to apply to complex real-world domains due to, for example, their instability in training. Therefore, in this paper, we adapt the soft-actor-critic (SAC) algorithm to the text-based environment. To deal with sparse extrinsic rewards from the environment, we combine it with a potential-based reward shaping technique to provide more informative (dense) reward signals to the RL agent. We apply our method to play difficult text-based games. The SAC method achieves higher scores than the Q-learning methods on many games with only half the number of training steps. This shows that it is well-suited for text-based games. Moreover, we show that the reward shaping technique helps the agent to learn…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Neural Networks and Reservoir Computing
MethodsGlobal Average Pooling · 1x1 Convolution · Dilated Convolution · Average Pooling · Convolution · Q-Learning · Switchable Atrous Convolution
