Eye of the Beholder: Improved Relation Generalization for Text-based   Reinforcement Learning Agents

Keerthiram Murugesan; Subhajit Chaudhury; Kartik Talamadupula

arXiv:2106.05387·cs.LG·June 16, 2021

Eye of the Beholder: Improved Relation Generalization for Text-based Reinforcement Learning Agents

Keerthiram Murugesan, Subhajit Chaudhury, Kartik Talamadupula

PDF

Open Access

TL;DR

This paper introduces a novel approach for text-based reinforcement learning agents by integrating visual representations to enhance object and relation understanding, leading to improved game performance.

Contribution

It proposes retrieving and using images of game states to improve relation generalization in text-based RL agents, a novel multimodal learning approach.

Findings

01

Improved agent performance in text-based games.

02

Enhanced relation understanding through visual data.

03

Better generalization across different game scenarios.

Abstract

Text-based games (TBGs) have become a popular proving ground for the demonstration of learning-based agents that make decisions in quasi real-world settings. The crux of the problem for a reinforcement learning agent in such TBGs is identifying the objects in the world, and those objects' relations with that world. While the recent use of text-based resources for increasing an agent's knowledge and improving its generalization have shown promise, we posit in this paper that there is much yet to be learned from visual representations of these same worlds. Specifically, we propose to retrieve images that represent specific instances of text observations from the world and train our agents on such images. This improves the agent's overall understanding of the game 'scene' and objects' relationships to the world around them, and the variety of visual representations on offer allow the agent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques