A ROS-Based Online System for 3D Gaussian Splatting Optimization: Flexible Frontend Integration and Real-Time Refinement

Li’an Wang; Jian Xu; Xuan An; Yujie Ji; Yuxuan Wu; Zhaoyuan Ma

PMC · DOI:10.3390/s25134151·July 3, 2025

A ROS-Based Online System for 3D Gaussian Splatting Optimization: Flexible Frontend Integration and Real-Time Refinement

Li’an Wang, Jian Xu, Xuan An, Yujie Ji, Yuxuan Wu, Zhaoyuan Ma

PDF

Open Access

TL;DR

This paper introduces a real-time 3D scene reconstruction system using ROS and Gaussian splatting, improving speed and quality over traditional methods.

Contribution

A ROS-based online system for 3D Gaussian splatting with flexible frontend integration and real-time refinement is proposed.

Findings

01

The system reduces initialization time by 90% compared to traditional COLMAP-3DGS methods.

02

It achieves an average PSNR improvement of 1.9 dB on multiple datasets.

03

The system uses a dynamic sliding-window strategy and a novel loss function for optimization.

Abstract

The 3D Gaussian splatting technique demonstrates significant efficiency advantages in real-time scene reconstruction. However, when its initialization process relies on traditional SfM methods (such as COLMAP), there are obvious bottlenecks, such as high computational resource consumption, as well as the decoupling problem between camera pose optimization and map construction. This paper proposes an online 3DGS optimization system based on ROS. Through the design of a loose-coupling architecture, it realizes real-time data interaction between the frontend SfM/SLAM module and backend 3DGS optimization. Using ROS as a middleware, this system can access the keyframe poses and point-cloud data generated by any frontend algorithms (such as ORB-SLAM, COLMAP, etc.). With the help of a dynamic sliding-window strategy and a rendering-quality loss function that combines L1 and SSIM, it achieves…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Chemicals1

ROS

Figures7

Click any figure to enlarge with its caption.

Keywords

3D reconstructionSLAM3D Gaussian splattingbundle adjustment

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · 3D Surveying and Cultural Heritage · Advanced Vision and Imaging

Full text

1. Introduction

Recently, 3D Gaussian Splatting (3DGS) [1] has emerged as a pivotal technique for generating dense scene representations from sparse points, offering superior efficiency for real-time applications. A critical bottleneck in 3DGS lies in its initialization, where conventional SfM approaches like COLMAP [2] impose extensive computational overheads via bundle adjustment (BA), which optimizes the 3DGS map quality without explicit consideration for the camera pose. To address this, we introduce Local Gaussian Splatting Bundle Adjustment (LGSBA), a two-stage optimization framework that first refines poses via traditional BA and then employs a sliding window with rendering-quality loss to enhance Gaussian map fidelity. Coupled with an ORB-SLAM-based online reconstruction pipeline via ROS, our system reduces the initialization time by 10× while improving 3DGS map PSNR. The key contributions include the following:

1.A Tightly Coupled ORB-SLAM and 3DGS Optimization System: We first propose a ROS-based tightly coupled framework that injects real-time local bundle adjustment (Local BA) results from ORB-SLAM into the 3DGS optimization pipeline. By dynamically acquiring keyframe poses and point cloud updates, this framework enables incremental optimization of the Gaussian map, reducing the initialization time overhead by 90% compared to traditional COLMAP workflows. Leveraging ORB-SLAM’s real-time localization capabilities, the system establishes a closed-loop between localization and mapping. This integration not only accelerates the 3DGS initialization but also ensures high-quality initial inputs for Gaussian map construction through real-time feature tracking and scene understanding.
2.Local Gaussian Splatting Bundle Adjustment (LGSBA): We propose an LGSBA optimization framework based on a sliding window, which dynamically refines local viewpoint poses by integrating a rendering-quality loss function that aggregates errors from all keyframes within the window (combining L1 loss and SSIM). This algorithm adaptively adjusts the Gaussian parameters and proposes to balance pose accuracy with map rendering quality, coupling the local structure refinement of the Gaussian map with camera pose optimization to mitigate map blurring caused by projection errors in various scenes. Experiments across three datasets—TUM-RGBD, Tanks and Temples, and KITTI—demonstrate an average PSNR improvement of 1.9 dB.
3.An Open-Source Codebase: The core algorithms, including ORB-SLAM initialization, LGSBA optimization, and the tightly coupled ROS framework, are open source and available at https://github.com/wla-98/worse-pose-but-better-3DGS, accessed on 29 June 2025, enabling the reproducibility of the proposed method and facilitating further research on 3D Gaussian splatting optimization. The repository includes implementation details for real-time initialization, sliding window-based bundle adjustment, and online map refinement, providing a comprehensive resource for the research community.

2. Related Work

2.1. 3D Gaussian Splatting

Benefiting from its explicit representation, short training time, and real-time rendering speed, 3DGS [1] has quickly outpaced Neural Radiance Fields [3] to become a prominent research focus in the field of 3D reconstruction. Numerous studies aim to enhance the pixel-level scene quality generated by 3DGS, which remains a key objective of 3DGS optimization.

Some studies [4,5] focus on optimizing the original parameters of 3DGS. However, parameter adjustments often require extensive experimentation and are closely tied to the specific characteristics of a given scene, limiting their generalizability. Other studies [6,7,8] improve rendering quality by incorporating depth estimation information, but acquiring depth data involves additional resource costs.

Despite these advancements, limited research has specifically explored how the initialization of camera poses and sparse point clouds impacts subsequent 3DGS results. Furthermore, most 3DGS reconstruction rely on the COLMAP [2] pipeline, which consumes substantial computational resources when processing large image sets, significantly reducing the efficiency of scene reconstruction. Addressing this limitation is a key objective of this article.

2.2. Visual SLAM

Direct SLAM methods [9,10,11,12] excel in handling low-texture and dynamic environments. However, due to their assumption of consistent grayscale intensity, these methods are highly sensitive to illumination changes and noise, which makes them less suitable for real-world applications. Feature-based SLAM [13,14,15,16,17,18] is highly robust, delivering excellent real-time performance and supporting multi-task parallel processing to construct high-precision maps. These systems integrate seamlessly with data from other sensors, providing accurate and reliable solutions for localization and mapping. However, they produce sparse point cloud maps, which hinder downstream tasks and demand considerable time for parameter tuning.

NeRF-based SLAM [3,19,20,21,22,23,24] excels in 3D dense modeling, rendering novel viewpoints, and predicting unknown regions, significantly enhancing 3D model generation and processing under optimal observational inputs. Nevertheless, these methods depend on large amounts of data and computational resources for neural network training, making real-time performance difficult to achieve. Furthermore, their implicit representations are less suited for downstream tasks.

Over the past two years, numerous studies have proposed SLAM systems based on 3DGS with explicit volumetric representations. Most of these systems [25,26,27,28,29,30,31,32] rely on 3D reconstruction frameworks, such as ORBSLAM [14,15], DROID-SLAM [33], or DUSt3R [34] to provide camera poses and initial point clouds for initializing the Gaussian map. Alternatively, they require depth information corresponding to RGB images as a prior to constrain the map reconstruction process. MonoGS [35] stands out by eliminating dependency on other SLAM systems for initialization. It unifies 3D Gaussian representations as the sole three-dimensional model while simultaneously optimizing the Gaussian map, rendering novel viewpoints, and refining camera poses.

However, all such systems face a critical limitation: achieving both higher-precision camera pose optimization and dense map reconstruction using only a monocular camera is challenging. This article clarifies the underlying reasons behind these limitations and proposes a novel approach that combines ORBSLAM initialization with LGSBA for pose optimization. This integration enhances the 3DGS framework, enabling it to achieve better results in terms of Gaussian map and rendered images across various scenarios.

3. Method

3.1. ORB-SLAM Initialization

The ORB-SLAM initialization process selects keyframes and optimizes poses via local/global BA. The core optimization is:

[eqn]

keyframe criteria and pose parameterization are detailed in Appendix A, while 3D points are extended with spherical harmonics for color representation.

3.2. 3DGS Fundamental Model

3DGS represents scenes using Gaussians:

[eqn]

with covariance parameterized as $[eqn]$ (Appendix B). Two-dimensional projection and $[eqn]$ -blending rendering are described in Appendix B.

3.3. Local Gaussian Splatting Bundle Adjustment (LGSBA) and Mathematical Derivations

The LGSBA framework employs a sliding-window strategy to balance camera pose accuracy and Gaussian map rendering quality, rooted in the mathematical properties of the special Euclidean group $[eqn]$ . For a keyframe pose $[eqn]$ and its Lie algebra $[eqn]$ , the left Jacobian is defined as:

[eqn]

forming the basis for gradient computation in pose optimization. Here, $[eqn]$ represents the group of rigid body transformations, combining rotation and translation, while $[eqn]$ is its corresponding Lie algebra.

The original single-keyframe loss function is:

[eqn]

where $[eqn]$ is the pixel-wise L1 loss, SSIM measures structural similarity, and $[eqn]$ weights their contributions. LGSBA extends this to a multi-keyframe sliding window (e.g., $[eqn]$ frames: current frame +3 before/after), with total loss:

[eqn]

where W denotes the keyframe index set.

For gradient optimization on $[eqn]$ , the 2D projection derivative with respect to pose $[eqn]$ is:

[eqn]

with the pose derivative of 3D point $[eqn]$ derived via Lie algebra:

[eqn]

Here, $[eqn]$ denotes the skew-symmetric matrix of the 3D point $[eqn]$ , and I is the identity matrix.

Similarly, the covariance derivative is:

[eqn]

where the rotational component derivative is:

[eqn]

In this context, $[eqn]$ is the rotation matrix of keyframe $[eqn]$ , and $[eqn]$ denotes the k-th column of $[eqn]$ . The operator $[eqn]$ represents the derivative with respect to the pose $[eqn]$ , while the skew-symmetric matrix $[eqn]$ enables the conversion between vector and matrix representations for rotational operations.

The total loss gradient under LGSBA becomes:

[eqn]

with gradient components:

[eqn]

The LGSBA workflow follows a dual stage:

Traditional BA Refinement: Conventional bundle adjustment optimizes poses for geometric accuracy, where $[eqn]$ represents the camera pose of keyframe $[eqn]$ , and $[eqn]$ are 3D map points.
Sliding-Window Optimization: $[eqn]$ refines poses to enhance rendering fidelity, leveraging $[eqn]$ gradient properties to balance pose precision and map quality, as validated in Section 3.3.

3.4. System Workflow Based on the Diagram

As illustrated in Figure 1, the complete system workflow is structured as follows:

3.4.1. Input and Frontend Initialization (ORB-SLAM3 Leading)

The system takes RGB images as input and executes the Tracking, Local Mapping, and Loop Closing processes via ORB-SLAM3:

1.Tracking: Monocular frames (Frame Mono) filter keyframes (Key Frame), providing basic data for subsequent optimization.
2.Local Mapping: This optimizes keyframe poses through local bundle adjustment (Local BA) and refines data with keyframe culling (Key Frame Culling).
3.Loop Closing: This triggers global bundle adjustment (Global BA) upon loop detection (Loop Detected) to correct accumulated errors, outputting optimized keyframe images, poses, and sparse point clouds (Node Image, Node Pose, Node 3D points).

3.4.2. Data Interaction and Middleware (ROS Bridging)

Leveraging ROS nodes, the Local Mapping thread of ORB-SLAM3 publishes real-time local BA results (keyframe images, poses, map points). After global BA completion by the Loop Closing thread, data updates are triggered, enabling tight coupling interaction between ORB-SLAM3 and the 3DGS system.

3.4.3. 3DGS Scene Construction and Optimization

Upon receiving data from ROS, the 3DGS system executes sequential operations tailored to ORB-SLAM’s loop-closing status:

1.Scene Creation: This generates scene information (Scene Info) by combining camera parameters (Camera Info) and Gaussian initialization (Gaussian Initial), initializing the 3DGS environment.
2.Training Strategy Selection:
(a)Incremental Training (No Global Loop): When ORB-SLAM has not detected a global loop (partial keyframes processed), LGSBA optimization is employed. It uses sliding-window loss ( $[eqn]$ ) to refine Gaussian parameters and camera poses via backpropagation and updates the map incrementally with new keyframes to maintain local consistency.
(b)Random Training (Global Loop Detected): After global loop closure (all keyframes processed), the original 3DGS random optimization is selected. It uses random sampling of scene cameras for global map refinement and corrects accumulated errors to ensure global consistency.
3.Scene Update Loop: This continuously refines Gaussian map parameters during incremental training, terminates incremental training upon global loop detection, triggering random optimization, and outputs the final 3DGS map and rendered RGB images after global optimization.

3.4.4. Loop Closure and Robustness Enhancement

Utilizing ORB-SLAM3’s global BA capability, the system introduces randomly selected scene cameras for training post-loop closure. This compensates for local optimization biases, enhancing both system robustness and 3DGS map reconstruction quality.

In summary, the system completes frontend pose and sparse map optimization via ORB-SLAM3, enables real-time data transfer through ROS, and executes Gaussian scene construction with LGSBA/3DGS hybrid optimization, forming a complete “front-end vision–middleware communication–back-end reconstruction” loop. The flowchart (Figure 1) intuitively presents the module interaction logic and data flow.

4. Dataset and Metrics

All experiments were conducted on a Windows 11 WSL2 system, equipped with an Intel Core i7-11700K CPU (Intel, Santa Clara, CA, USA ) and an NVIDIA RTX 3090 GPU (Nvidia, Santa Clara, CA, USA).

4.1. Datasets Description

We evaluated our approach on three publicly available datasets: TUM-RGBD, Tanks and Temples, KITTI. These datasets offer complementary advantages in terms of scene types, motion patterns, and evaluation metrics, which enabled a comprehensive assessment of the proposed method under varied conditions.

4.1.1. TUM-RGBD Dataset (TUM)

The TUM-RGBD dataset is widely used for indoor SLAM research. It contains multiple sequences of both dynamic and static indoor scenes captured using a handheld RGB-D camera. Ground-truth trajectories with high precision are provided, and common evaluation metrics include Absolute Trajectory Error (ATE) and Relative Pose Error (RPE). This dataset is ideal for assessing the stability of feature tracking in the presence of dynamic disturbances and the precision of backend optimization.

4.1.2. Tanks and Temples Dataset (TANKS)

The Tanks and Temples dataset is designed for large-scale outdoor 3D reconstruction tasks. It features high-resolution scenes with complex geometry and detailed textures. This dataset is frequently used to evaluate multi-view 3D reconstruction and dense scene modeling methods, allowing us to assess the capability of our 3DGS approach in modeling radiance fields at different scales.

4.1.3. KITTI Dataset (KITTI)

The KITTI dataset was collected from a vehicle-mounted platform and is primarily targeted at SLAM and visual odometry research in real-world traffic scenarios. It provides continuous image sequences along with high-precision GPS/IMU data, making it suitable for evaluating the system’s ability to suppress accumulated errors over long-term operations and its robustness in challenging road environments.

The complementary nature of these datasets—in terms of indoor/outdoor settings, different motion modalities (handheld, vehicle, aerial), and diverse evaluation metrics (ATE/RPE)—enables a systematic evaluation of

1.The robustness of the SLAM frontend under dynamic disturbances;2.The radiance field modeling capabilities of 3DGS in scenes of varying scales;3.The suppression of cumulative errors during long-term operations.

4.2. Evaluation Metrics

To quantitatively analyze the reconstruction results, we adopted several evaluation metrics that measure the differences between the reconstructed images and ground truth, the accuracy of camera poses, and the geometric precision. The metrics used are as follows:

1.L1 Loss: This metric evaluates image quality by computing the pixel-wise absolute difference between the reconstructed image and the ground truth image. A lower L1 loss indicates a smaller discrepancy and, consequently, better reconstruction quality:

[eqn]

where N is the total number of pixels.

2.PSNR: PSNR measures the signal-to-noise ratio between the reconstructed and ground truth images, expressed in decibels (dB). A higher PSNR value indicates better image quality. It is computed as:

[eqn]

with R being the maximum possible pixel value (typically 255 for 8-bit images) and $[eqn]$ defined as:

[eqn]

3.SSIM: SSIM measures the structural similarity between two images by considering luminance, contrast, and texture. Its value ranges from −1 to 1, with values closer to 1 indicating higher similarity:

[eqn]

where $[eqn]$ are the means, $[eqn]$ the variances, and $[eqn]$ the covariance of images x and y, with $[eqn]$ and $[eqn]$ being small constants.

4.LPIPS: LPIPS assesses perceptual similarity between images using deep network features. It is computed as:

[eqn]

where $[eqn]$ represents the feature map extracted from the k-th layer of a deep convolutional network, and K is the total number of layers. A lower LPIPS indicates a smaller perceptual difference.

5.APE: APE quantifies the geometric precision of the reconstructed camera poses by computing the Euclidean distance between the translational components of the estimated and ground truth poses. For N poses, it is defined as:

[eqn]

If rotation errors are also considered, APE can be extended as:

[eqn]

where $[eqn]$ is the axis-angle error for rotation, and $[eqn]$ is a weighting factor (typically set to 1).

6.RMSE: RMSE measures the average error between the reconstructed and ground truth camera poses, computed as:

[eqn]

where $[eqn]$ and $[eqn]$ denote the reconstructed and ground truth poses for frame i, respectively.

These metrics provide a comprehensive evaluation of the image reconstruction quality, visual fidelity, and geometric accuracy. In our experiments, we used these metrics to quantitatively compare different methods and provide a solid basis for performance evaluation.

5. Experiment

The Section 5 is structured to validate the two primary innovations of this study: (1) the Local Gaussian Splatting Bundle Adjustment algorithm, and (2) the ROS-based tightly coupled ORB-SLAM and 3DGS system, which integrates real-time initialization with dynamic optimization. Each experiment directly demonstrates how these innovations enhance 3DGS reconstruction quality and efficiency.

5.1. Experiment 1: LGSBA Effectiveness Verification via 3DGS Training Comparison

This experiment aimed to evaluate the LGSBA algorithm by comparing three training strategies under the ORB-SLAM initialization framework: original 3DGS training, Scaffold-GS, and our LGSBA-enhanced approach. The core objective was to demonstrate that LGSBA—when initialized by ORB-SLAM—optimizes 3DGS rendering quality by dynamically balancing ORB-SLAM’s pre-trained camera poses (refined via its built-in traditional BA) and Gaussian map fidelity, even when introducing controlled pose errors. This directly validates the innovation of LGSBA’s two-stage optimization framework:

Leveraging ORB-SLAM’s Native BA: Initial refinement of camera poses using ORB-SLAM’s traditional bundle adjustment.
Sliding-Window Joint Optimization: Subsequent refinement of both poses and Gaussian parameters using rendering-quality loss ( $[eqn]$ ), prioritizing map fidelity while maintaining acceptable pose accuracy.

The experimental design focused solely on the ORB-SLAM initialization paradigm. The results were expected to show that LGSBA improves rendering quality in real-time scenarios, highlighting its superiority in integrating pose optimization with 3DGS rendering quality under the ORB-SLAM framework.

5.1.1. System Workflow of 3DGS Reconstruction

As shown in Figure 2, the workflow was structured to explicitly validate the tight coupling of ORB-SLAM and LGSBA, a key innovation of the proposed system:

ORB-SLAM Preprocessing: Input RGB sequences are processed to select keyframes based on tracking quality and scene dynamics, leveraging ORB-SLAM’s real-time localization capabilities. This step outputs optimized camera poses $[eqn]$ and 3D point clouds $[eqn]$ , demonstrating the efficiency of ORB-SLAM initialization compared to COLMAP.
Gaussian Map Initialization: Three-dimensional points are converted to Gaussian parameters:

[eqn]

where $[eqn]$ is the spatial coordinate, c is the spherical harmonic coefficient for color encoding, and $[eqn]$ is the opacity. This step forms the basis for 3DGS scene representation, highlighting the innovation of using ORB-SLAM’s output for rapid initialization. 3. LGSBA Sliding-Window Optimization: A seven-frame sliding window is employed for joint optimization, integrating the rendering-quality loss function:

[eqn]

Here, $[eqn]$ balances the contributions of pixel-wise L1 loss and structural similarity (SSIM), directly reflecting the innovation of LGSBA in coupling pose optimization with rendering quality. The sliding-window strategy dynamically refines local viewpoints, mitigating map blurring caused by projection errors. 4. Quantitative Evaluation: Rendered images are compared against ground truth using metrics (PSNR, SSIM, LPIPS, APE, RMSE) to quantify the improvement in 3DGS reconstruction quality. This evaluation directly measures the impact of LGSBA’s optimization on both visual fidelity and geometric accuracy, aligning with the study’s core innovation of enhancing 3DGS rendering through pose-map co-optimization.

5.1.2. Quantitative Analysis

As shown in Table 1, under 30K iterations, LGSBA achieves significant improvements: +1.9 dB average PSNR, −0.012 L1 loss, +0.068 SSIM, and −0.081 LPIPS compared to the original 3DGS with ORB-SLAM initialization. Notable gains include a 5.05 dB PSNR improvement in the TANKS-Family scene, 2.11 dB in TANKS-Train, and 7.65 dB in TANKS-Caterpillar, highlighting its effectiveness in outdoor environments.

5.1.3. Qualitative Analysis

Figure 3 presents a comprehensive comparison of rendering quality across different methods and scenes. Our LGSBA-enhanced approach consistently demonstrated superior reconstruction fidelity across all datasets, particularly excelling in preserving fine-grained details and minimizing artifacts in complex scenarios. The key observations include the following:

1.Detail Preservation: LGSBA maintains sharper object contours (e.g., machinery edges in Tank-Caterpillar) and consistent surface textures (e.g., ground patterns in TUM-fg1-floor) compared to baseline methods.
2.Artifact Reduction: Noticeable reduction in rendering artifacts (blurring, floating points) in occluded regions, especially visible in Tank-Family scenes.
3.Color Consistency: More accurate color transitions and lighting reproduction, particularly evident in the Tank-horse model’s metallic surfaces.
4.Geometric Integrity: Improved depth perception and structural coherence in zoomed segments (red boxes), validating LGSBA’s effectiveness in local map refinement.

5.1.4. Camera Pose Evaluation

This section evaluates LGSBA’s impact on pose estimation and its relationship with 3DGS rendering quality. The quantitative analysis of pose errors (Table 2) revealed a consistent trade-off: while LGSBA optimization increased Absolute Pose Error (APE) and Root Mean Square Error (RMSE) in most scenarios, e.g., APE in TUM-fg2-desk rises from 0.46 cm to 4.17 cm, it simultaneously boosted rendering quality, with Peak Signal-to-Noise Ratio (PSNR) increasing by up to 7.66 dB in TANKS-Caterpillar. This inverse relationship challenges conventional assumptions that a higher pose accuracy guarantees a better reconstruction quality.

Trajectory visualizations (Figure 4) provide spatial context: LGSBA-optimized trajectories show increased deviation from ground truth in structured environments like TUM-fg1-xyz. Crucially, these minor pose sacrifices yielded significant visual gains—average PSNR improved by 1.9 dB—making LGSBA particularly valuable for applications prioritizing visual fidelity over metric precision, including immersive VR/AR, dynamic scene modeling, and real-time robotic perception.

5.2. Online Integration of ORB-SLAM with 3DGS Map Optimization by ROS System

The proposed system establishes a tightly coupled pipeline through ROS middleware, enabling real-time data exchange between ORB-SLAM3 and 3DGS optimization modules. As illustrated in Figure 1 and specifically in Section 3.4, this integration comprises three core components:

5.2.1. ORB-SLAM3 Frontend Processing

The frontend handles real-time visual odometry and mapping:

1.Tracking: Monocular frames are processed for feature extraction and keyframe selection based on scene dynamics and tracking quality.
2.Local Mapping: This performs local bundle adjustment (Local BA) to optimize keyframe poses and applies keyframe culling to remove redundancies.
3.Loop Closing: This detects loop closures to trigger global bundle adjustment (Global BA), correcting accumulated drift in poses and 3D points.

This pipeline outputs optimized keyframe images $[eqn]$ , camera poses $[eqn]$ , and sparse 3D point clouds $[eqn]$ at a 10 times faster initialization than offline COLMAP-based systems.

5.2.2. ROS Middleware Bridging

Real-time data exchange is implemented through ROS topics/services:

1.The Local Mapping thread publishes incremental updates (keyframe poses/images/map points).
2.The Loop Closing thread triggers synchronization events upon global BA completion.
3.Custom ROS nodes ensure asynchronous communication without blocking SLAM processes.

This design maintains ORB-SLAM3’s real-time performance while providing fresh inputs for 3DGS optimization.

5.2.3. 3DGS Optimization with Dynamic Strategy Switching

The backend dynamically adapts training strategies based on loop closure status:

Incremental Training

1.Activates when no global loop is detected (partial keyframes processed);2.Employs LGSBA with seven-frame sliding-window optimization;3.Minimizes rendering-quality loss $[eqn]$ Equation (5) via backpropagation;4.Jointly optimizes Gaussian parameters $[eqn]$ and poses $[eqn]$ :

[eqn]

Random Training

1.Activates after global loop closure (all keyframes processed);2.Switches to original 3DGS random optimization;3.Uses scene-wide camera sampling for global error correction;4.Ensures consistency across large-scale environments.

5.2.4. System Advantages

This integrated framework provides the following:

1.Real-Time Capability: 10× faster initialization than COLMAP-3DGS systems.
2.Adaptive Optimization: Balances local consistency (LGSBA) and global accuracy (random training).
3.Robustness: Loop closure triggers map-wide error correction.
4.Online Performance: It achieves >25 FPS end-to-end throughput on NVIDIA RTX 3090 GPU with
(a)ORB-SLAM frontend: 30+ FPS (CPU processing);
(b)3DGS optimization: 25 FPS (GPU rendering).

5.2.5. Experimental Validation of System Feasibility

The laboratory scene experiments demonstrated the superiority of our online integrated system over traditional offline reconstruction methods. The quantitative results in Table 3 and qualitative visualizations in Figure 5 and Figure 6 collectively validate the system’s feasibility and advantages.

Quantitative Analysis

The comparison in Table 3 reveals consistent improvements across key metrics:

1.At 7K Iterations: Our online method achieved a +0.27 dB PSNR gain (+1.3%), a +0.003 SSIM improvement, and a -0.030 LPIPS reduction compared to offline reconstruction, while maintaining a comparable L1 loss.
2.At 30K Iterations: Significant quality improvements emerged:
(a)A +0.50 dB PSNR gain (+2.1%);
(b)A +0.012 SSIM improvement;
(c)A −0.042 LPIPS reduction (15.2% lower).
3.Training Efficiency: The online system reached near-optimal quality (30K-level metrics) at just 7K iterations, demonstrating accelerated convergence.

Qualitative Analysis

Visual evidence further confirmed the system’s effectiveness:

1.Rendering Fidelity: Figure 5 confirms
(a)Photorealistic novel view synthesis;
(b)Accurate lighting and material reproduction;
(c)Clear structural details in close-up views.
2.Gaussian Map Quality: Figure 6 demonstrates
(a)Precise geometric reconstruction of laboratory equipment;
(b)Detailed surface representation (e.g., texture on cylindrical objects);
(c)Minimal floating artifacts in complex areas.
3.Geometric Consistency: Figure 7 shows accurate camera pose estimation and sparse point cloud generation by ORB-SLAM, providing reliable initialization for 3DGS.

Conclusion on System Feasibility

The experimental results validate three critical aspects of our integrated system:

1.Real-Time Capability: It achieved a >25 FPS throughput while maintaining reconstruction quality.
2.Quality Superiority: It outperformed offline methods in perceptual metrics (PSNR/SSIM/LPIPS).
3.Operational Robustness: Consistent performance in practical indoor environments.

These findings demonstrate that our ROS-based integration of ORB-SLAM with 3DGS optimization successfully bridges the gap between real-time SLAM and high-quality neural rendering, establishing a feasible solution for 3D reconstruction applications.

6. Conclusions

This study proposes 3D Gaussian splatting optimization based on Local Gaussian Splatting Bundle Adjustment, a two-stage optimization framework that bridges ORB-SLAM and 3DGS via ROS middleware. The core contributions are as follows:

1.Tightly Coupled System: A ROS-based pipeline integrating ORB-SLAM’s real-time Local Bundle Adjustment with 3DGS optimization. This reduces initialization time by 90% versus COLMAP while improving average PSNR by 1.9 dB across the TUM-RGBD, Tanks and Temples, and KITTI datasets.
2.LGSBA Algorithm: A sliding-window strategy jointly optimizes rendering-quality loss ( $[eqn]$ ) and camera poses. This balances geometric accuracy with perceptual fidelity, mitigating blurring artifacts induced by projection errors and enhancing detail preservation in complex scenes.
3.Open-Source Implementation: The released codebase (https://github.com/wla-98/worse-pose-but-better-3DGS, accessed on 29 June 2025) supports reproducibility, providing tools for real-time initialization and online map refinement.

The experiments validated LGSBA’s superiority in visual quality (e.g., +5.05 dB PSNR in TANKS-Family) and operational efficiency, establishing a practical solution for 3D reconstruction across diverse scenarios.

7. Limitations

Despite advancements, critical limitations remain:

1.Dynamic Environments: Reliance on ORB-SLAM’s feature tracking causes performance degradation with moving objects (e.g., pedestrians in TUM-RGBD). Dynamic elements introduce tracking errors, leading to inconsistent Gaussian map updates and rendering artifacts.
2.Illumination Sensitivity: ORB-SLAM’s susceptibility to lighting variations (e.g., outdoor shadows or flickering indoor lights) reduces pose estimation accuracy.
3.Insufficient Failure Analysis: While evaluated on diverse datasets, systematic examination of edge cases (e.g., low-texture scenes or extreme lighting) is absent. Without quantitative fault diagnosis, robustness boundaries remain unquantified.
4.Computational Overhead: Despite faster initialization, optimizing high-dimensional Gaussian parameters (covariance matrices, spherical harmonics) in ultra-large scenes imposes significant GPU memory demands, constraining real-time performance.

Future work should address these limitations by integrating dynamic object segmentation, developing illumination-invariant features, and conducting rigorous failure mode analyses.

Bibliography37

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Kerbl B. Kopanas G. Leimkühler T. Drettakis G. 3D Gaussian Splatting for Real-Time Radiance Field Rendering ACM Trans. Graph.20234213915310.1145/3592433 · doi ↗
2Schonberger J.L. Frahm J.M. Structure-From-Motion Revisited Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)Las Vegas, NV, USA 27–30 June 201641044113
3Mildenhall B. Srinivasan P.P. Tancik M. Barron J.T. Ramamoorthi R. Ng R. Nerf: Representing scenes as neural radiance fields for view synthesis Commun. ACM 2021659910610.1145/3503250 · doi ↗
4Cheng K. Long X. Yang K. Yao Y. Yin W. Ma Y. Wang W. Chen X. Gaussianpro: 3d gaussian splatting with progressive propagation Proceedings of the Forty-First International Conference on Machine Learning Vienna, Austria 21–27 July 2024
5Zhang J. Zhan F. Xu M. Lu S. Xing E. Fregs: 3d gaussian splatting with progressive frequency regularization Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Seattle, WA, USA 17–21 June 20242142421433
6Chung J. Oh J. Lee K.M. Depth-regularized optimization for 3d gaussian splatting in few-shot images Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)Seattle, WA, USA 17–18 June 2024811820
7Li J. Zhang J. Bai X. Zheng J. Ning X. Zhou J. Gu L. Dngaussian: Optimizing sparse-view 3d gaussian radiance fields with global-local depth normalization Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Seattle, WA, USA 16–22 June 20242077520785
8Zhu Z. Fan Z. Jiang Y. Wang Z. Fsgs: Real-time few-shot view synthesis using gaussian splatting Proceedings of the European Conference on Computer Vision Milan, Italy 29 September 2024 Springer Berlin/Heidelberg, Germany 2025145163