Pathformer3D: A 3D Scanpath Transformer for 360{\deg} Images

Rong Quan; Yantao Lai; Mengyu Qiu; Dong Liang

arXiv:2407.10563·cs.CV·July 16, 2024

Pathformer3D: A 3D Scanpath Transformer for 360{\deg} Images

Rong Quan, Yantao Lai, Mengyu Qiu, Dong Liang

PDF

Open Access 1 Repo

TL;DR

Pathformer3D introduces a novel 3D Transformer-based model for predicting scanpaths in 360-degree images, effectively addressing distortion issues of 2D projections and improving accuracy in virtual reality applications.

Contribution

The paper proposes a 3D spherical coordinate system approach and a Transformer architecture for scanpath prediction, outperforming existing 2D-based methods.

Findings

01

Outperforms state-of-the-art methods on four datasets

02

Utilizes 3D spherical coordinates to reduce distortion

03

Employs Transformer architecture to model fixation dependencies

Abstract

Scanpath prediction in 360{\deg} images can help realize rapid rendering and better user interaction in Virtual/Augmented Reality applications. However, existing scanpath prediction models for 360{\deg} images execute scanpath prediction on 2D equirectangular projection plane, which always result in big computation error owing to the 2D plane's distortion and coordinate discontinuity. In this work, we perform scanpath prediction for 360{\deg} images in 3D spherical coordinate system and proposed a novel 3D scanpath Transformer named Pathformer3D. Specifically, a 3D Transformer encoder is first used to extract 3D contextual feature representation for the 360{\deg} image. Then, the contextual feature representation and historical fixation information are input into a Transformer decoder to output current time step's fixation embedding, where the self-attention module is used to imitate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lsztzp/pathformer3d
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Medical Image Segmentation Techniques · Optical Coherence Tomography Applications

MethodsAttention Is All You Need · Byte Pair Encoding · Layer Normalization · Linear Layer · Label Smoothing · Adam · Dropout · Multi-Head Attention · Dense Connections · Softmax