Nearest Neighbor Future Captioning: Generating Descriptions for Possible Collisions in Object Placement Tasks
Takumi Komatsu, Motonari Kambara, Shumpei Hatanaka, Haruka, Matsuo, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi, and Komei Sugiura

TL;DR
This paper introduces a novel model for future collision captioning in domestic service robots, combining nearest neighbor retrieval and collision region attention to improve descriptive accuracy of potential risks.
Contribution
It proposes the Nearest Neighbor Future Captioning Model with a Collision Attention Module, explicitly modeling collision regions for better risk description in robot planning.
Findings
Outperforms baseline methods on CIDEr-D score (33.08 vs. 25.09)
Introduces a new collision dataset for DSRs
Enhances explainability of robot collision predictions
Abstract
Domestic service robots (DSRs) that support people in everyday environments have been widely investigated. However, their ability to predict and describe future risks resulting from their own actions remains insufficient. In this study, we focus on the linguistic explainability of DSRs. Most existing methods do not explicitly model the region of possible collisions; thus, they do not properly generate descriptions of these regions. In this paper, we propose the Nearest Neighbor Future Captioning Model that introduces the Nearest Neighbor Language Model for future captioning of possible collisions, which enhances the model output with a nearest neighbors retrieval mechanism. Furthermore, we introduce the Collision Attention Module that attends regions of possible collisions, which enables our model to generate descriptions that adequately reflect the objects associated with possible…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Subtitles and Audiovisual Media · Natural Language Processing Techniques
