FreeVS: Generative View Synthesis on Free Driving Trajectory

Qitai Wang; Lue Fan; Yuqi Wang; Yuntao Chen; Zhaoxiang Zhang

arXiv:2410.18079·cs.CV·October 24, 2024

FreeVS: Generative View Synthesis on Free Driving Trajectory

Qitai Wang, Lue Fan, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang

PDF

Open Access 3 Reviews

TL;DR

FreeVS introduces a fully generative method for synthesizing realistic driving scene views on arbitrary trajectories, overcoming limitations of existing methods that only work along recorded paths, and is validated on the Waymo dataset.

Contribution

The paper presents FreeVS, a novel generative approach for free-viewpoint synthesis in driving scenes, with a pseudo-image representation and new benchmarks for viewpoint freedom.

Findings

01

Strong synthesis performance on recorded trajectories

02

Effective view generation on novel trajectories

03

Validated on Waymo dataset

Abstract

Existing reconstruction-based novel view synthesis methods for driving scenes focus on synthesizing camera views along the recorded trajectory of the ego vehicle. Their image rendering performance will severely degrade on viewpoints falling out of the recorded trajectory, where camera rays are untrained. We propose FreeVS, a novel fully generative approach that can synthesize camera views on free new trajectories in real driving scenes. To control the generation results to be 3D consistent with the real scenes and accurate in viewpoint pose, we propose the pseudo-image representation of view priors to control the generation process. Viewpoint transformation simulation is applied on pseudo-images to simulate camera movement in each direction. Once trained, FreeVS can be applied to any validation sequences without reconstruction process and synthesis views on novel trajectories. Moreover,…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 3

Strengths

### Novelty Clever use of pseudo-images obtained through colored point cloud projection as a unified representation for all view priors, simplifying the learning objective for the generative model. ### Evaluation Introduces two new challenging benchmarks - novel camera synthesis and novel trajectory synthesis. ### Efficiency The authors claim it takes less computational resources at inference time compared to splatting-based models. ### Performance Better performance versus contenting methods,

Weaknesses

### Novelty Engineering work -- it boils down to an addon for Video Stable Diffusion that has colored LiDAR point features concatenated. ### Different type of artfacts The method trades the gaussian and nerf artifacts with the diffusion ones. While there is no denying that FreeVS works better than the previous attempts from novel views, for single front view, splatting still yields significantly better results (Table 3, front view). ### Evaluation A single dataset is benchmarked (Waymo Open D

Reviewer 02Rating 6Confidence 4

Strengths

Impressive results on a challenging task - Results are clearly better than baselines in novel camera synthesis, multi-view novel frame synthesis, and significantly better in novel trajectory synthesis (FID drops by towards 75%) - Qualitative comparisons clearly back these results - Video results show impressive generation well outside input trajectory, while other methods have severe artifacts / fail entirely Creative use of 3D and off-the-shelf models to enable a non-conventional setup - Novel

Weaknesses

Could use clearer argument for method leading to performance gain - Numbers in the ablations table do not match that in comparisons to baselines. Why not? - Ablations show little impact on performance. When the FID of this method is less than a third of that of baselines, surely more than 10% of performance can be explained by choices. For example, how does training data impact performance? What about pretraining or architecture? If these are important, it feels the architecture should be descri

Reviewer 03Rating 6Confidence 3

Strengths

1. Much better (more robust) results over 3D-optimization based approaches (EmerNerf, streetgaussian) on far-away novel veiws, because they typically have overfitting issues. 2. A combination of 3D informtion and 2D diffusion model that provides both controllability and decent rendering results.

Weaknesses

1. The rendering speed is very slow, while 3DGS which can render at real time (50+fps), this hinder the downstream applications that requires realtime efficiency. 2. Inconsistency results because of using large decoder 3. Worse performance compared to 3D-optimization approaches if the novel view are close to source views;

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Simulation Techniques and Applications · Semantic Web and Ontologies

MethodsFocus