BOSS: A Benchmark for Human Belief Prediction in Object-context Scenarios
Jiafei Duan, Samson Yu, Nicholas Tan, Li Yi, Cheston Tan

TL;DR
This paper introduces BOSS, a challenging multimodal video dataset designed to evaluate AI systems' ability to predict human belief states based on nonverbal cues and object-context relations, crucial for safe human-robot interaction.
Contribution
The paper presents a novel dataset with detailed belief state labels and multimodal inputs, advancing research in AI understanding of human beliefs in social scenarios.
Findings
Baseline models' performance varies with input modalities.
Object-context relations significantly influence belief prediction accuracy.
Multimodal data improves AI's ability to infer human beliefs.
Abstract
Humans with an average level of social cognition can infer the beliefs of others based solely on the nonverbal communication signals (e.g. gaze, gesture, pose and contextual information) exhibited during social interactions. This social cognitive ability to predict human beliefs and intentions is more important than ever for ensuring safe human-robot interaction and collaboration. This paper uses the combined knowledge of Theory of Mind (ToM) and Object-Context Relations to investigate methods for enhancing collaboration between humans and autonomous systems in environments where verbal communication is prohibited. We propose a novel and challenging multimodal video dataset for assessing the capability of artificial intelligence (AI) systems in predicting human belief states in an object-context scenario. The proposed dataset consists of precise labelling of human belief state…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial Robot Interaction and HRI · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
