CylinderSplat: 3D Gaussian Splatting with Cylindrical Triplanes for Panoramic Novel View Synthesis
Qiwei Wang, Xianghui Ze, Jingyi Yu, Yujiao Shi

TL;DR
CylinderSplat introduces a cylindrical Triplane representation and a dual-branch architecture to improve panoramic 3D Gaussian Splatting, enabling high-quality, real-time novel view synthesis for 360° scenes with sparse views.
Contribution
It proposes a novel cylindrical Triplane representation and a dual-branch framework specifically designed for panoramic scenes, addressing geometric distortion and occlusion issues in 3D Gaussian Splatting.
Findings
Achieves state-of-the-art results in panoramic view synthesis.
Outperforms previous methods in reconstruction quality.
Handles variable input views from single to multiple panoramas.
Abstract
Feed-forward 3D Gaussian Splatting (3DGS) has shown great promise for real-time novel view synthesis, but its application to panoramic imagery remains challenging. Existing methods often rely on multi-view cost volumes for geometric refinement, which struggle to resolve occlusions in sparse-view scenarios. Furthermore, standard volumetric representations like Cartesian Triplanes are poor in capturing the inherent geometry of scenes, leading to distortion and aliasing. In this work, we introduce CylinderSplat, a feed-forward framework for panoramic 3DGS that addresses these limitations. The core of our method is a new {cylindrical Triplane} representation, which is better aligned with panoramic data and real-world structures adhering to the Manhattan-world assumption. We use a dual-branch architecture: a pixel-based branch reconstructs well-observed regions, while a…
Peer Reviews
Decision·ICLR 2026 Poster
1. Originality. Introduces a cylindrical triplane representation specifically designed for panoramic 3D Gaussian Splatting. Proposes a three-stage curriculum training strategy (pixel, volume, joint) that significantly enhances reconstruction in occluded and sparsely viewed regions. 2. Quality. Provides comprehensive evaluations across multiple panoramic and indoor datasets with consistent improvements over prior methods. Includes thorough ablation studies demonstrating the effectiveness of the
1. The method relies heavily on a strong depth prior, but the paper does not analyze its robustness or potential failure cases. 2. The domain bias is significant—experiments focus primarily on indoor, Manhattan-world panoramas. Analysis of non-Manhattan or outdoor environments would strengthen the generality claims. 3. The paper does not discuss or address artifacts near seams and poles, which are common in panoramic representations and could impact real-world performance.
- a practical pipeline that avoids heavy cost‑volumes while supporting a variable number of input panoramas via attention. - Cylindrical triplane is a sensible geometric choice for Manhattan‑style scenes; the paper gives qualitative intuition and ablations showing it outperforms Cartesian and spherical alternatives in this setting. - The paper measures the effect of each key component: coordinate system, RGB retrieval, multiple per‑camera triplanes, and training curriculum. - Results are broa
- While the cylindrical triplane is well‑motivated, it can be viewed as a coordinate adaptation of established triplane+3DGS paradigms, rather than a new representation class or learning principle. Much of the performance gain appears to come from engineering side and somehow lacks of technical novelty. - The transformation from cylindrical local scales to Cartesian scales is kind of approximation without derivation—raising concerns about correctness and whether R and S are consistently transfo
1. The cylindrical coordinate system choice is well-justified through both theoretical analysis (Manhattan-world assumption) and comprehensive ablations showing clear advantages over Cartesian and spherical alternatives. 2. Extensive experiments are shown across multiple datasets with thorough ablations examining triplane resolution, coordinate systems, initialization strategies, and rendering methods. Consistent improvements are shown in both single-view and multi-view settings, with particular
1. Limited novelty: The core contribution is essentially a coordinate system change for existing triplane methods. The dual-branch architecture, attention mechanisms etc. seem to be borrowed from prior work. 2. Manhattan-world assumption limitations: The method seems to be explicitly designed for environments with orthogonal surfaces (indoor/urban scenes). Applicability to natural outdoor scenes, curved structures, or non-Manhattan environments is unclear 3. Training complexity: The three-stage
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · 3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques
