RayRoPE: Projective Ray Positional Encoding for Multi-view Attention
Yu Wu, Minsik Jeon, Jen-Hao Rick Chang, Oncel Tuzel, Shubham Tulsiani

TL;DR
RayRoPE introduces a novel projective ray-based positional encoding for multi-view transformers that encodes patches uniquely, is $SE(3)$-invariant, and adapts to scene geometry, improving performance on view synthesis and depth estimation.
Contribution
The paper proposes RayRoPE, a new positional encoding scheme that encodes patches based on rays, ensuring $SE(3)$ invariance and scene-adaptive geometry modeling, addressing limitations of prior encodings.
Findings
RayRoPE achieves 24% improvement in LPIPS on RE10K.
RayRoPE improves stereo depth estimation accuracy.
The method is efficient and outperforms existing encoding schemes.
Abstract
We study positional encodings for multi-view transformers that process tokens from a set of posed input images, and seek a mechanism that encodes patches uniquely, allows -invariant attention with multi-frequency similarity, and can adapt to the geometry of the underlying 3D scene. We find that prior (absolute or relative) encoding schemes for multi-view attention do not meet these desiderata, and present RayRoPE to address this gap. RayRoPE represents patch positions based on associated rays and computes query-frame projective coordinates to ensure invariance. To adapt to scene geometry, RayRoPE predicts (without direct supervision) a per-token depth to obtain its position along the corresponding ray, while also modeling uncertainty and analytically computing the expected positional encoding. We validate our method on the tasks of novel-view synthesis and stereo depth…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques
