Attention Guidance through Video Script: A Case Study of Object Focusing on 360{\deg} VR Video Tours

Paulo Vitor Santana Silva; Arthur Ricardo Sousa Vit\'oria; Diogo Fernandes Costa Silva; Arlindo Rodrigues Galv\~ao Filho

arXiv:2603.16875·cs.HC·March 19, 2026

Attention Guidance through Video Script: A Case Study of Object Focusing on 360{\deg} VR Video Tours

Paulo Vitor Santana Silva, Arthur Ricardo Sousa Vit\'oria, Diogo Fernandes Costa Silva, Arlindo Rodrigues Galv\~ao Filho

PDF

Open Access

TL;DR

This paper explores how combining AI models Grounding Dino and SAM with video scripts can effectively guide viewer attention in 360-degree VR video tours, enhancing user experience.

Contribution

It introduces a novel method integrating object grounding models with video scripts to improve attention guidance in immersive VR environments.

Findings

01

Video scripts improve user attention focus in VR tours

02

Combining Grounding Dino and SAM effectively guides viewers

03

Enhanced user experience demonstrated in case study

Abstract

Within the expansive domain of virtual reality (VR), 360{\deg} VR videos immerse viewers in a spherical environment, allowing them to explore and interact with the virtual world from all angles. While this video representation offers unparalleled levels of immersion, it often lacks effective methods to guide viewers' attention toward specific elements within the virtual environment. This paper combines the models Grounding Dino and Segment Anything (SAM) to guide attention by object focusing based on video scripts. As a case study, this work conducts the experiments on a 360{\deg} video tour on the University of Reading. The experiment results show that video scripts can improve the user experience in 360{\deg} VR Videos Tour by helping in the task of directing the user's attention.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVirtual Reality Applications and Impacts · Human Motion and Animation · Visual Attention and Saliency Detection