Pathformer3D: A 3D Scanpath Transformer for 360{\deg} Images
Rong Quan, Yantao Lai, Mengyu Qiu, Dong Liang

TL;DR
Pathformer3D introduces a novel 3D Transformer-based model for predicting scanpaths in 360-degree images, effectively addressing distortion issues of 2D projections and improving accuracy in virtual reality applications.
Contribution
The paper proposes a 3D spherical coordinate system approach and a Transformer architecture for scanpath prediction, outperforming existing 2D-based methods.
Findings
Outperforms state-of-the-art methods on four datasets
Utilizes 3D spherical coordinates to reduce distortion
Employs Transformer architecture to model fixation dependencies
Abstract
Scanpath prediction in 360{\deg} images can help realize rapid rendering and better user interaction in Virtual/Augmented Reality applications. However, existing scanpath prediction models for 360{\deg} images execute scanpath prediction on 2D equirectangular projection plane, which always result in big computation error owing to the 2D plane's distortion and coordinate discontinuity. In this work, we perform scanpath prediction for 360{\deg} images in 3D spherical coordinate system and proposed a novel 3D scanpath Transformer named Pathformer3D. Specifically, a 3D Transformer encoder is first used to extract 3D contextual feature representation for the 360{\deg} image. Then, the contextual feature representation and historical fixation information are input into a Transformer decoder to output current time step's fixation embedding, where the self-attention module is used to imitate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Medical Image Segmentation Techniques · Optical Coherence Tomography Applications
MethodsAttention Is All You Need · Byte Pair Encoding · Layer Normalization · Linear Layer · Label Smoothing · Adam · Dropout · Multi-Head Attention · Dense Connections · Softmax
