Enhancing Reinforcement Learning in 3D Environments through Semantic Segmentation: A Case Study in ViZDoom

Hugo Huang

arXiv:2511.11703·cs.LG·November 18, 2025

Enhancing Reinforcement Learning in 3D Environments through Semantic Segmentation: A Case Study in ViZDoom

Hugo Huang

PDF

Open Access

TL;DR

This paper introduces semantic segmentation-based input representations to improve reinforcement learning in 3D environments, significantly reducing memory use and enhancing agent performance in ViZDoom.

Contribution

It proposes two novel semantic segmentation input methods for RL in 3D environments, demonstrating substantial memory savings and performance improvements.

Findings

01

SS-only reduces memory buffer use by up to 98.6%.

02

RGB+SS improves RL agent performance with semantic info.

03

Density-based heatmaps effectively visualize agent movement.

Abstract

Reinforcement learning (RL) in 3D environments with high-dimensional sensory input poses two major challenges: (1) the high memory consumption induced by memory buffers required to stabilise learning, and (2) the complexity of learning in partially observable Markov Decision Processes (POMDPs). This project addresses these challenges by proposing two novel input representations: SS-only and RGB+SS, both employing semantic segmentation on RGB colour images. Experiments were conducted in deathmatches of ViZDoom, utilizing perfect segmentation results for controlled evaluation. Our results showed that SS-only was able to reduce the memory consumption of memory buffers by at least 66.6%, and up to 98.6% when a vectorisable lossless compression technique with minimal overhead such as run-length encoding is applied. Meanwhile, RGB+SS significantly enhances RL agents' performance with the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Human Pose and Action Recognition