The P$^3$ dataset: Pixels, Points and Polygons for Multimodal Building Vectorization
Raphael Sulzer, Liuyun Duan, Nicolas Girard, Florent Lafarge

TL;DR
The P$^3$ dataset provides a comprehensive multimodal benchmark combining aerial LiDAR, imagery, and building outlines, enabling improved building vectorization through multimodal data fusion and robust prediction models.
Contribution
This paper introduces the large-scale P$^3$ dataset that integrates LiDAR, imagery, and vector data for building vectorization, and demonstrates the benefits of multimodal fusion for accurate polygon prediction.
Findings
LiDAR data enhances building polygon prediction robustness.
Fusing LiDAR and imagery improves accuracy and geometric quality.
State-of-the-art models benefit from the multimodal dataset.
Abstract
We present the P dataset, a large-scale multimodal benchmark for building vectorization, constructed from aerial LiDAR point clouds, high-resolution aerial imagery, and vectorized 2D building outlines, collected across three continents. The dataset contains over 10 billion LiDAR points with decimeter-level accuracy and RGB images at a ground sampling distance of 25 centimeter. While many existing datasets primarily focus on the image modality, P offers a complementary perspective by also incorporating dense 3D information. We demonstrate that LiDAR point clouds serve as a robust modality for predicting building polygons, both in hybrid and end-to-end learning frameworks. Moreover, fusing aerial LiDAR and imagery further improves accuracy and geometric quality of predicted polygons. The P dataset is publicly available, along with code and pretrained weights of three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFocus
