FishRoPE: Projective Rotary Position Embeddings for Omnidirectional Visual Perception
Rahul Ahuja, Mudit Jain, Bala Murali Manoghar Sai Sudhakar, Venkatraman Narayanan, Pratik Likhar, Varun Ravi Kumar, Senthil Yogamani

TL;DR
FishRoPE introduces a novel spherical coordinate-based attention mechanism and adaptation framework that enables vision models to effectively handle fisheye camera distortions without retraining from scratch.
Contribution
The paper proposes FishRoPE, a lightweight, architecture-agnostic method that adapts frozen vision models to fisheye geometry using spherical attention reparameterization.
Findings
Achieves state-of-the-art results on WoodScape 2D detection with 54.3 mAP.
Attains 65.1 mIoU on SynWoodScapes BEV segmentation.
Introduces negligible computational overhead with FishRoPE.
Abstract
Vision foundation models (VFMs) and Bird's Eye View (BEV) representation have advanced visual perception substantially, yet their internal spatial representations assume the rectilinear geometry of pinhole cameras. Fisheye cameras, widely deployed on production autonomous vehicles for their surround-view coverage, exhibit severe radial distortion that renders these representations geometrically inconsistent. At the same time, the scarcity of large-scale fisheye annotations makes retraining foundation models from scratch impractical. We present \ours, a lightweight framework that adapts frozen VFMs to fisheye geometry through two components: a frozen DINOv2 backbone with Low-Rank Adaptation (LoRA) that transfers rich self-supervised features to fisheye without task-specific pretraining, and Fisheye Rotary Position Embedding (FishRoPE), which reparameterizes the attention mechanism in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
