Grounding Commands for Autonomous Vehicles via Layer Fusion with Region-specific Dynamic Layer Attention
Hou Pong Chan, Mingxi Guo, Cheng-Zhong Xu

TL;DR
This paper introduces a novel layer fusion method with region-specific dynamic attention to improve language grounding in autonomous vehicles, leading to more accurate region localization from natural language commands.
Contribution
It proposes the first layer fusion approach combined with RSD layer attention for better scene understanding in language grounding tasks for autonomous vehicles.
Findings
Outperforms state-of-the-art methods on Talk2Car benchmark
Achieves more accurate region prediction
Enhances understanding of visual and language features
Abstract
Grounding a command to the visual environment is an essential ingredient for interactions between autonomous vehicles and humans. In this work, we study the problem of language grounding for autonomous vehicles, which aims to localize a region in a visual scene according to a natural language command from a passenger. Prior work only employs the top layer representations of a vision-and-language pre-trained model to predict the region referred to by the command. However, such a method omits the useful features encoded in other layers, and thus results in inadequate understanding of the input scene and command. To tackle this limitation, we present the first layer fusion approach for this task. Since different visual regions may require distinct types of features to disambiguate them from each other, we further propose the region-specific dynamic (RSD) layer attention to adaptively fuse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Human Pose and Action Recognition
