Do Autonomous Agents Benefit from Hearing?
Abraham Woubie, Anssi Kanervisto, Janne Karttunen, and Ville Hautamaki

TL;DR
This paper investigates how incorporating audio cues alongside visual data in deep reinforcement learning enhances an agent's ability to perform reach-the-goal tasks, demonstrating improved behavior with multimodal sensory input.
Contribution
It introduces a multi-modal approach combining audio and visual information for reinforcement learning agents, showing benefits over visual-only methods.
Findings
Agents perform better with combined audio-visual input.
Audio cues help in reaching goals outside visual range.
Multimodal sensing improves overall agent behavior.
Abstract
Mapping states to actions in deep reinforcement learning is mainly based on visual information. The commonly used approach for dealing with visual information is to extract pixels from images and use them as state representation for reinforcement learning agent. But, any vision only agent is handicapped by not being able to sense audible cues. Using hearing, animals are able to sense targets that are outside of their visual range. In this work, we propose the use of audio as complementary information to visual only in state representation. We assess the impact of such multi-modal setup in reach-the-goal tasks in ViZDoom environment. Results show that the agent improves its behavior when visual information is accompanied with audio features.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural dynamics and brain function · Reinforcement Learning in Robotics · Evolutionary Algorithms and Applications
