Self-view Grounding Given a Narrated 360{\deg} Video

Shih-Han Chou; Yi-Chun Chen; Kuo-Hao Zeng; Hou-Ning Hu; Jianlong Fu,; Min Sun

arXiv:1711.08664·cs.CV·November 27, 2017·1 cites

Self-view Grounding Given a Narrated 360{\deg} Video

Shih-Han Chou, Yi-Chun Chen, Kuo-Hao Zeng, Hou-Ning Hu, Jianlong Fu,, Min Sun

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel visual grounding model that automatically predicts the normal field of view in narrated 360-degree videos by integrating video content and subtitles, enhancing user guidance without human supervision.

Contribution

The proposed Visual Grounding Model (VGM) efficiently combines CNN and RNN with attention mechanisms to accurately ground NFoVs in 360-degree videos using subtitles, without requiring manual annotations.

Findings

01

Achieved state-of-the-art NFoV-grounding performance on a new dataset.

02

Effectively integrates video features and subtitles for accurate NFoV prediction.

03

Introduced a reverse sentence training strategy to improve model robustness.

Abstract

Narrated 360{\deg} videos are typically provided in many touring scenarios to mimic real-world experience. However, previous work has shown that smart assistance (i.e., providing visual guidance) can significantly help users to follow the Normal Field of View (NFoV) corresponding to the narrative. In this project, we aim at automatically grounding the NFoVs of a 360{\deg} video given subtitles of the narrative (referred to as "NFoV-grounding"). We propose a novel Visual Grounding Model (VGM) to implicitly and efficiently predict the NFoVs given the video content and subtitles. Specifically, at each frame, we efficiently encode the panorama into feature map of candidate NFoVs using a Convolutional Neural Network (CNN) and the subtitles to the same hidden space using an RNN with Gated Recurrent Units (GRU). Then, we apply soft-attention on candidate NFoVs to trigger sentence decoder…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ShihHanChou/360grounding
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition