CRePE: Curved Ray Expectation Positional Encoding for Unified-Camera-Controlled Video Generation

Seonghyun Jin; Youngmin Kim; Sunwoo Park; Jong Chul Ye

arXiv:2605.12938·cs.CV·May 14, 2026

CRePE: Curved Ray Expectation Positional Encoding for Unified-Camera-Controlled Video Generation

Seonghyun Jin, Youngmin Kim, Sunwoo Park, Jong Chul Ye

PDF

TL;DR

CRePE introduces a novel camera-aware positional encoding for video generation that effectively handles wide-angle and fisheye lenses, enhancing stability and geometry-awareness in camera control.

Contribution

The paper proposes CRePE, a unified camera model-compatible positional encoding that incorporates depth-aware geometry, improving camera control stability and supporting external geometry control.

Findings

01

CRePE improves stability in camera control across diverse camera models.

02

It enhances geometry-aware and perceptual-quality metrics in video generation.

03

CRePE supports external scene-geometry control and motion transfer.

Abstract

Camera-conditioned video generation requires positional encoding that remains reliable under changes in camera motion, lens configuration, and scene structure. However, existing attention-level camera encodings either provide ray-only camera signals or rely on pinhole camera geometry, limiting their applicability to general camera control under the Unified Camera Model, including wide-angle and fisheye lenses. To address this limitation, we propose Curved Ray Expectation Positional Encoding (CRePE). CRePE represents each image token as a depth-aware positional distribution along its source ray, providing a Unified Camera Model-compatible positional encoding that captures the projected-path geometry induced by wide-angle and fisheye cameras. CRePE is implemented through a Geometric Attention Adapter added to frozen video DiTs, injecting token-wise scene-distance information into selected…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.