Spatial Language Likelihood Grounding Network for Bayesian Fusion of Human-Robot Observations

Supawich Sitdhipol; Waritwong Sukprasongdee; Ekapol Chuangsuwanich; Rina Tse

arXiv:2507.19947·cs.RO·July 31, 2025

Spatial Language Likelihood Grounding Network for Bayesian Fusion of Human-Robot Observations

Supawich Sitdhipol, Waritwong Sukprasongdee, Ekapol Chuangsuwanich, Rina Tse

PDF

TL;DR

This paper introduces FP-LGN, a neural network that grounds spatial language in map features to enable uncertainty-aware fusion of human and robot observations, improving collaborative task performance.

Contribution

The paper presents a novel Feature Pyramid Likelihood Grounding Network that learns spatial language grounding with uncertainty estimation for robotic perception.

Findings

01

FP-LGN matches expert rules in Negative Log-Likelihood

02

The model shows greater robustness with lower uncertainty

03

Fusion of human language and robot sensors improves task performance

Abstract

Fusing information from human observations can help robots overcome sensing limitations in collaborative tasks. However, an uncertainty-aware fusion framework requires a grounded likelihood representing the uncertainty of human inputs. This paper presents a Feature Pyramid Likelihood Grounding Network (FP-LGN) that grounds spatial language by learning relevant map image features and their relationships with spatial relation semantics. The model is trained as a probability estimator to capture aleatoric uncertainty in human language using three-stage curriculum learning. Results showed that FP-LGN matched expert-designed rules in mean Negative Log-Likelihood (NLL) and demonstrated greater robustness with lower standard deviation. Collaborative sensing results demonstrated that the grounded likelihood successfully enabled uncertainty-aware fusion of heterogeneous human language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.