Edge–Point Cloud Fusion for Geometric Fitting of Cylinder Parameters Using Single-View RGB-D Data
Huayan Zhang, Jiaxin Liu, Zhongkui Wang

TL;DR
This paper introduces a method that combines 2D edge information with 3D point cloud data to more accurately fit cylinder parameters from a single RGB-D camera view.
Contribution
The novel approach fuses 2D edge constraints with point cloud data to improve cylinder parameter fitting in RGB-D data.
Findings
The method achieves significant improvements in fitting accuracy for cylinders in real-world RGB-D data.
Incorporating 2D edge information reduces the impact of noise in point cloud data.
The approach demonstrates enhanced robustness compared to point cloud-only methods.
Abstract
Cylinders are common in both industrial and daily settings. Accurate geometric fitting of their parameters, including position, orientation, and radius, is important in real-world perception tasks and industrial applications. At present, consumer-level RGB-D cameras provide three-dimensional (3D) point cloud data with acceptable accuracy and are widely adopted in various sensing applications. Consequently, this task is typically formulated as a geometric fitting problem based on point cloud data. However, point cloud data acquired from such sensors often contain noise, particularly when scanning curved surfaces, which directly degrades the performance of point cloud-based fitting methods. In this paper, we propose an edge–point cloud fusion approach for the geometric fitting of cylinder parameters from single-view RGB-D data. Our approach leverages two-dimensional (2D) image-domain edge…
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16- —JSPS KAKENHI
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Image and Object Detection Techniques · Robotics and Sensor-Based Localization
1. Introduction
Cylinders are fundamental geometric primitives that are widely adopted in both industrial (pipes, tanks, etc.) and real-world (cans, bottles, etc.) environments. Accurate estimation of their parameters (including orientation, position and radius) is important for industrial applications and perception tasks. This capability ensures the reliable reconstruction of cylindrical structures, such as modeling pipeline plants [1] and Building Information Modeling (BIM) [2,3], while also serving as a basis for robotic manipulation [4,5,6]. Point cloud data provide 3D geometry information and represent a primary data source for such applications. In this context, the estimation of cylinder parameters is typically formulated as a geometric fitting task based on point cloud data, making it a fundamental problem in geometric modeling.
Traditional methods have been widely adopted in prior research works, which mainly include Random Sample Consensus (RANSAC) [7], the Hough transform [8,9], Principal Component Analysis (PCA) [10,11], and least squares-based fitting methods [12,13,14,15]. While these methods achieve good performance on well-conditioned point cloud data, maintaining accuracy on complex real-world data remains challenging. This becomes more noticeable in practical applications where point cloud data are acquired by consumer-level 3D scanners such as RGB-D cameras. Due to their low cost and acceptable accuracy, these types of sensors have become widely adopted; however, point cloud data from such sensors typically contain noise [16,17], particularly for complex curved surfaces. This severely affects the performance of traditional methods, leading to axial drift, unreliable radius estimation, and inaccurate model fitting.
Since traditional methods are sensitive under such sensing conditions, researchers have made efforts to extend the classical framework. These extensions mainly include integrating the curvature information into the voting process [9], improving the normal estimation [18], suppressing unreliable depth measurements near object boundaries [1], and clustering-based filtering strategies [19]. These methods continue to follow multi-stage frameworks in which the central axis and the radius are estimated in a step-by-step scheme. This design tends to produce errors early on, which are then transferred and accumulated in later stages. Consequently, the upper bound in terms of overall performance is restricted. To this end, Zhang et al. [15] proposed a least squares-based optimization framework that can jointly optimize cylinder parameters; although this method achieves better fitting to input point cloud data, parameter estimation accuracy is still influenced by the complex noise in real-world data.
While depth measurements are susceptible to noise, color images from RGB-D cameras contain complementary geometric information. In view of this observation, Kawagoshi et al. [20] attempted to utilize edge cues other than point clouds alone for the cylinder fitting. However, their work focuses on radius estimation and relies solely on a viewpoint-specific modeling assumption, making this method less generalizable. Therefore, it is essential to investigate more effective methods for integrating multimodal geometric information to fit cylinder parameters in real-world sensing scenarios.
In this paper, we propose an optimization-based geometric fitting method for estimating cylinder parameters from single-view RGB-D observations. Our method is designed as a backend geometric refinement module, assuming that cylindrical regions are available from upstream pipelines and focusing on reducing the effect of noisy point cloud data on parameter estimation. Unlike prior work [20], our formulation explicitly considers edge alignment constraints that are inherent from the projected cylinder geometry. We introduce a complementary modality fusion strategy that combines 3D point measurements with image-domain edge information within a unified optimization framework, allowing reliable edge information to compensate for deviations in the estimated parameters. To assess the effectiveness of the proposed method, we evaluate it on real-world RGB-D data under controlled settings. The experimental results show that our approach achieves significant improvements in both accuracy and robustness. The contributions of this paper are as follows:
- We propose an edge–point cloud fusion method for geometric fitting of cylinder parameters.
- We present a unified fusion formulation and an optimization procedure to jointly estimate all cylinder parameters under constraints derived from both point measurements and edge observations.
- We validate the effectiveness of our approach and demonstrate significant performance improvements on real-world RGB-D data.
The remainder of this paper is organized as follows: Section 2 and Section 3 provide the notations, camera model, and geometric formulations of the cylinder; Section 4 details the proposed edge–point cloud fusion method for cylinder parameter estimation; Section 5 provides comprehensive experiments on robotic acquisition datasets and real-world pipe scenarios; Section 6 discusses the proposed method; finally, Section 7 concludes the paper and outlines future work.
2. Notations and Preliminaries
2.1. Notations
We use lowercase and uppercase letters for scalars (e.g., ), bold lowercase letters for column vectors (e.g., ), and bold uppercase letters for matrices (e.g., ). For a vector , its i-th entry is denoted by . For a matrix , denotes its i-th row and j-th column scalar, while denotes its i-th column vector. The operator denotes the transpose, denotes the inverse, and denotes the norm. The -normalized vector is defined as . The operator converts a vector into its homogeneous coordinate representation, e.g., . The identity matrix of size is . The zero matrix of size is . The 2D and 3D rotation groups are denoted by and , respectively. The capitalized exponential map maps a vector element to its corresponding element in the group space [21]. For and groups, they are defined as follows:
where , , and denotes the skew-symmetric operator applied to a 3D vector :
Finally, we denote the depth image by , where represents the image domain.
2.2. Camera Model
As the point cloud data used for cylinder fitting are reconstructed from depth images, we briefly review the camera model. We assume a well-calibrated RGB-D camera such that the corresponding color and depth images are registered and rectified. Given a pixel coordinate and its depth measurement , the 3D point is recovered via the inverse projection mapping under the standard pinhole camera model [22]:
where denotes the camera intrinsic matrix:
where are the focal lengths and represent the coordinates of the principal point.
3. Geometric Formulation of the Cylinder
3.1. Parametric Representation of the Cylinder
An infinite 3D cylinder has five degrees of freedom (DoFs): four DoFs for the central axis and one for the radius , as shown in Figure 1a. In this paper, we adopt three parameterizations of the central axis: the point-direction form and Plücker line coordinates for geometric computation, and the orthonormal representation for optimization.
3.1.1. Point-Direction Form
Geometrically, the central axis is a 3D line. It can be represented by the point-direction form , where is a point on the central axis and denotes its direction. However, this representation uses six parameters for a 3D line with four DoFs, resulting in redundancy and potential numerical instability for optimization. To address this issue, we introduce the orthonormal representation [23], which provides a compact and well-conditioned parameterization for the optimization problem.
3.1.2. Plücker Line Coordinates
Since the orthonormal representation is derived from the Plücker line coordinates [24], we first review this concept. The Plücker line coordinates are defined as , where is the moment vector given by
which is perpendicular to the interpretation plane containing the central axis and the origin. Here, denotes the normalized direction vector. Due to the existence of Plücker constraints and , Plücker line coordinates have four DoFs in total, and as such provide a compact representation.
3.1.3. Orthonormal Representation
To eliminate the Plücker constraints and enable unconstrained optimization, the orthonormal representation [23] is introduced. This representation parameterizes a 3D line using a pair , which is derived from the Plücker line coordinates:
This results in a minimal four-DoF representation of a 3D line without Plücker constraints. This representation, serving as the parameterization of the central axis , enables joint optimization of all cylinder parameters on manifolds [15].
3.1.4. Representation Conversion
The conversion from orthonormal representation to Plücker line coordinates is given by
Once are obtained, the point on the 3D line closest to the origin is computed as
yielding the point-direction form. This establishes the correspondence among the three parameterizations considered in this study.
3.2. Projection of the Cylinder onto the Image Plane
The projection of a cylinder onto the image plane is represented by two visible edges [25]. We first recall how to project a 3D line onto the image plane, then derive the projection of the cylinder’s edges.
3.2.1. Projection of a 3D Line
A 3D line represented in Plücker line coordinates projects onto the image plane using the mapping [26], defined as follows:
where denotes the homogeneous representation of the 2D line in the image plane, as shown in Figure 1b. The line projection matrix is determined by the camera’s intrinsic parameters
3.2.2. Projection of Cylinder Edges
Given the central axis of a cylinder, the 3D visible edges ( ) along with their 2D projections are derived through geometric computation. As shown in Figure 1b, and denote the points on the corresponding 3D lines that are closest to the camera optical center . The angle is defined as the angle between the vectors from to and . According to the trigonometric relationship within the right triangle formed by the hypotenuse and the opposite side r, the angle is computed as follows:
Once the angle is determined, are obtained by rotating about the axis through with direction by rotation angles of coupled with a scaling operation. The closed-form expression is given by
With established and the direction vector inherited from , the 3D visible edges are fully specified. The moment vectors are determined using Equation (5), and their projections are subsequently obtained via the mapping defined in (9):
To simplify notation, we denote this complete process as
4. Geometric Fitting of Cylinder Parameters via Edge–Point Cloud Fusion
4.1. Problem Formulation
We assume that a segmentation mask corresponding to the visible cylindrical surface of the target object is provided in the RGB image, and that its two longitudinal edges are available. We denote the observed edge segments as the set , where each represents an edge segment specified by its two endpoints in pixel coordinates. The point cloud data from the cylindrical surface are obtained by back-projecting the masked depth image, and are denoted as .
An overview of the proposed method is shown in Figure 2. We take the preprocessed data as input into the geometric fitting module, which integrates point cloud data and edge observations to jointly optimize the cylinder parameters. The objective function is formulated as a weighted combination of two energy terms:
where represents the complete set of cylinder parameters using an orthonormal representation [15] that models the target as a visible segment of an infinite cylinder, enabling parameter estimation from observed cylindrical surface fragments without requiring end-face visibility. The objective function consists of two key components:
- 1.The point-to-cylinder energy term ensures 3D geometric consistency, optimizing the model parameters against observed point cloud data .
- 2.The edge alignment energy term constrains the pair of projected cylinder edges derived from to align with the 2D edge annotations , ensuring spatial–visual consistency.
The edge fusion weight balances the contributions of the two energy terms. In the following sections, we detail the formulation of the energy terms and the optimization strategy. The estimated cylinder parameters provide a compact geometric representation of the target object, enabling geometric processing beyond parameter estimation. As an application, we show that the estimated model parameters enable recovery of the cylinder centroid and length as well as model-based completion of the point cloud.
4.2. Point-to-Cylinder Energy Term
This term quantifies the geometric deviation of the observed points from the estimated cylinder surface. We define the point-wise error term as the signed minimal distance from a point to the cylinder surface:
Consequently, the total point measurement energy is computed as the mean squared error over all N observations:
4.3. Edge Alignment Energy Term
To avoid scale ambiguities and maintain metric consistency with , we define the edge alignment term in 3D space, allowing the weight to directly control the relative contributions of the two terms. Although the point cloud data corresponding to the annotation set can be reconstructed from the depth image for error formulation, the resulting measurements are frequently missing or unreliable in regions near object boundaries. For this reason, we develop a geometry-based approach that back-projects the annotated endpoints by intersecting their viewing rays with a plane derived from the model parameters . Notably, this approach does not rely on depth measurements at edge pixel locations. Therefore, even when depth data are missing near boundaries, the edge alignment term remains well-defined and provides effective geometric constraints. We first describe the data association strategy used to establish correspondences between the model parameters and the observed edge annotations , then present the formal definition of the edge alignment energy.
4.3.1. Edge-to-Model Data Association
To establish correspondences for the observed edge annotations , we generate the projected edges from the cylinder parameters using (14). Let denote the set of partitioned observations, where represents the specific segment in assigned to the projected line . The optimal association is determined by minimizing the cumulative alignment error. We construct a cost matrix , where each entry quantifies the geometric distance between the m-th observed segment and the j-th projected line :
where denotes the perpendicular distance from an endpoint to a line :
The assignment is obtained by comparing the cost of the direct correspondence (represented by the diagonal sum ) against the swapped correspondence (represented by the anti-diagonal sum ). Accordingly, the matched segments are determined by
The procedure for establishing these correspondences is detailed in Algorithm 1. Note that because the projected lines are implicitly dependent on the model parameters , the edge correspondence process is updated throughout the optimization process.
Algorithm 1: Data Association for Edge Alignment
4.3.2. Energy Term Formulation
With the established correspondences, the edge alignment energy is formulated through a geometry-based back-projection strategy. As illustrated in Figure 3, each endpoint defines a viewing ray originating from the camera optical center :
A reference plane is constructed from the cylinder parameters , and is spanned by the two visible edges:
where denotes a point on the j-th visible edge and represents the axis direction. The plane is defined by the form , where is the plane normal and denotes a point on . By intersecting the viewing ray with the plane , the back-projected 3D point is obtained as
The edge alignment error term associated with edge is defined as the distance from to :
Finally, the total energy is formulated as the mean squared errors over all endpoints:
4.4. Solver
The minimization of (15) is a weighted nonlinear least-squares problem, which is solved by using a coarse-to-fine strategy. Specifically, an initial estimate is first obtained via a RANSAC-based method [7] and subsequently refined through iterative optimization.
Since the variables associated with the central axis must reside on manifolds during iteration, we adopt the optimization technique on manifold [27]. At the n-th iteration, a perturbation vector is applied to the current estimate , resulting in an updated state defined as
By linearizing (15) around the current estimate , the objective function (15) is approximated as
where is the stacked error vector and is the Jacobian matrix:
The weighting matrix balances the contributions of the point-to-cylinder term and the edge alignment term in the overall objective function, and is defined as
To minimize the formulated weighted nonlinear least squares problem, we adopt the Levenberg–Marquardt (LM) algorithm [28]. At each iteration, the update step is computed by solving the following normal equations:
where denotes the damping factor. If the update reduces the cost , then the update step is accepted and is decreased; otherwise, the update is rejected and is increased. The complete procedure is summarized in Algorithm 2. Algorithm 2: Iterative Cylinder Refinement via Edge-Point Cloud Fusion
-
*Initialization.*The initial guess ,the point set ,the observed edge pairs ,damping factor , and scale factor .
-
Set the current estimate .
-
Construct the weighting matrix via Equation (28).
-
Update. (Optimization Loop)
-
Under , extract the associated edge set AssociateEdges ( ).
-
Construct the stacked error vector and the Jacobian matrix via Equation (27).
-
Compute the update step via Equation (29).
-
If , accept the update and decrease .Otherwise, reject the update and increase .
-
*Termination.*Repeat the Update step until convergence.Output the optimized parameters .
4.5. Applications
The optimized cylinder parameters serve as a geometric prior for downstream tasks. This section presents two downstream applications enabled by the estimated cylinder model: model-based point cloud completion and finite extent recovery for cylindrical objects.
4.5.1. Model-Based Point Cloud Completion
Due to the sensor’s inherent limitations and measurement noise, raw point cloud data often suffer from missing or corrupted regions. To address this, we employ a ray tracing-based approach [29] to restore surface geometry. Based on the optimized cylinder parameters , points in the masked region are reconstructed by intersecting the cylinder model with rays originating from the camera optical center . This process yields geometrically complete point cloud data .
4.5.2. Finite Extent Recovery
Although the optimized parameter set corresponds to an infinite cylinder, real-world applications require a finite-length cylinder representation. Since single-view observations are not always complete, our objective is to recover the observed centroid and the observed length defined on the masked region of the cylindrical surface. Specifically, we define as the maximal extent of the visible cylindrical surface and set to be the midpoint of this extent. We first project the points in onto the estimated cylinder central axis, resulting in a projected point set . The two endpoints and are obtained by maximizing the pairwise distance within the set :
The centroid of a finite cylinder is computed by the midpoint of the two endpoints, and the visible length is given by their distance:
This formulation is independent of prior assumptions on object length and accommodates both full and partial visibility. When the observed surface covers the full span of the cylindrical object, the estimated parameters align with the physical centroid and length. Under partial observability (e.g., due to invisible end-faces or external occlusion), the method recovers the centroid and axial extent of the visible segment.
5. Experiments
In this section, we conduct a series of experiments to provide a comprehensive evaluation of the proposed edge–point cloud fusion method. All experiments are performed on a desktop with an Intel i5-12400F CPU (Intel Corporation, Santa Clara, CA, USA, 32 GB RAM, and an NVIDIA RTX 3060Ti GPU (NVIDIA Corporation, Santa Clara, CA, USA). We implement our method in JAX [30] for numerical computing, which enables GPU hardware to accelerate the computations.
This section is organized as follows: Section 5.1 introduces the dataset and evaluation metrics; Section 5.2 describes the baseline methods; in Section 5.3, we conduct an ablation study on the edge fusion weight, while Section 5.4 performs the sensitivity analysis; Section 5.5 presents comparison results against other methods; Section 5.6 presents the computational efficiency analysis; finally, we provide an application in Section 5.7 by demonstrating its performance in a real-world piping environment.
5.1. Datasets and Evaluation Metrics
To conduct experiments under controlled conditions, we use a consumer-grade RGB-D camera (Astra+, Orbbec Technology Co., Ltd., Shenzhen, China) for data acquisition, which measures depth data by a monocular speckled structured-light technique. To improve pixel-level alignment, we perform camera calibration using the standard checkerboard-based method [31], as detailed in [15]. The raw depth images are subsequently registered to the RGB coordinate system to generate the aligned RGB-D input required by our method. The image resolution is set to pixels. Standard aluminum cylinders with radii mm and a fixed length of 100 mm are considered as testing targets. This range of radii is selected in order to assess performance across different cylinder sizes.
5.1.1. Data Acquisition with Viewpoint Variations
To assess the performance of the proposed method under different observation conditions, we introduce variations in the camera viewpoint. A six-degree-of-freedom industrial robot arm (UR5e, Universal Robots, Odense, Denmark) is used to control the camera motion. As shown in Figure 4a, an RGB-D camera is mounted on the end-effector of the robot arm, allowing the camera to be moved to a target pose under consistent and reproducible motion conditions. All data were collected under stable indoor lighting conditions.
The target cylinder is placed at the center of a planar board with fiducial AprilTag markers [32]. The board provides visual references for guiding the robot during camera motion and enables the establishment of ground-truth references. We define the camera viewpoint by the tilt angle and the working distance , with the target cylinder appearing at the center of the image, as shown in Figure 4a. Within the camera’s operating range, the tilt angle is specified as and the working distance is fixed at m. Figure 4b shows an example of the captured point cloud data. In the extreme case of large tilt angles and small radii (i.e., and mm), data acquisition fails due to the sensor’s inherent limitations when measuring highly curved surfaces at oblique viewing angles. This setup yields 24 valid configurations, with the robot remaining stationary for each one while capturing 20 consecutive RGB-D images in order to evaluate repeatability. Figure 5 reports the point number statistics of different cylinder radii in the dataset.
To establish ground-truth references, we follow the procedures from Hinterstoisser et al. [33]. The target cylinder is manually placed at the center of the planar board so that the center point and axis direction relative to the planar board are known. The camera pose relative to the planar board is computed by solving the Perspective-n-Point (PnP) problem using the method in [34]. Therefore, the complete ground-truth cylinder parameters in the camera frame are determined by combining the known radius and length with the estimated center point and axis direction.
5.1.2. Data Annotation
We employ manual annotations for the cylindrical region and edge segments to ensure a strictly controlled evaluation. Specifically, the cylindrical region in the RGB images is annotated using the Labelme tool [35] and the point cloud data belonging to the target cylinder are extracted by back-projecting the depth images. The two visible longitudinal edges are manually labeled to generate the observed edge set , which lies along the object boundaries in the RGB images. Figure 4c illustrates an example of the annotated data. This experimental design explicitly decouples geometric parameter estimation errors from potential uncertainties introduced by upstream detection modules. This isolation allows for a rigorous benchmarking of the theoretical upper bound on the accuracy and robustness of the proposed method.
5.1.3. Evaluation Metrics
We evaluate estimation accuracy using four metrics: orientation error , position error , relative radius error , and relative length error . Let the estimated parameters be and the ground truth . The metrics are defined as
5.2. Baseline Methods
The proposed method is evaluated against three representative point-based cylinder fitting baselines: a RANSAC-based approach [7] as implemented in [36], and two least squares-based methods proposed by Eberly [37] and Zhang et al. [15]. Since the proposed method operates as a backend refinement module, RANSAC is used both to initialize the solver and as a baseline representing the coarse initial estimate. Eberly’s method [37] estimates the cylinder’s orientation by minimizing a quadratic form, followed by closed-form solutions of the remaining geometric parameters. The method of Zhang et al. [15] is a special case of the proposed method when the edge fusion weight is set to zero. All methods follow their default implementation. To ensure a fair comparison, both least squares-based baselines are initialized with the same RANSAC-based estimate as ours.
It is worth noting that the three baseline methods rely solely on point information. Furthermore, these methods estimate an infinite cylinder model, and as such do not directly provide the object centroid or finite length. To ensure a consistent comparison, we apply the same postprocessing procedure described in Section 4.5 to recover these parameters for all compared methods.
5.3. Ablation Study on Edge Fusion Weight
To analyze the effect of the edge-fusion weight , we use a small validation subset consisting of the first frame from six representative configurations of the complete dataset. These configurations are defined by three cylinder radii mm and two tilt angles . A grid search over is then performed on this validation subset.
As shown in Figure 6, when (i.e., using point term only), performance degrades for mm cylinders, while mm cylinders are less affected. This is due to the inherent limitations of the RGB-D sensor when measuring surfaces with high curvature, which make the point cloud data unreliable and degrade the parameter estimation performance. When (edge term only), the solver becomes unstable, as the back-projection-based edge alignment leads to an ill-conditioned optimization problem. In the absence of point-based metric constraints, the cylinder scale becomes weakly constrained, especially the radius, resulting in ambiguous or unreliable parameter updates. As a result, the radius is not reliably optimized and tends to remain close to its initial value, while only the central axis is refined. In contrast, the weight values ( ) lead to improved accuracy across all evaluated configurations. This result highlights the roles of the two energy terms in parameter estimation. The absolute metric scale is provided by the point-to-cylinder term , and geometric consistency is additionally enforced by the edge alignment term . Based on this analysis, we choose for all subsequent experiments, since this value provides consistent performance across different configurations.
5.4. Sensitivity Analysis
This section presents a sensitivity analysis of the proposed method. We evaluate the robustness of the proposed method against perturbations in the edge observations as well as in the solver’s initialization, both of which are critical factors. The analysis is conducted on the validation subset defined in Section 5.3.
5.4.1. Robustness to Edge Perturbation
To simulate noisy edge observations, we introduce synthetic perturbations to the endpoint coordinates of the observed edge set . Specifically, each endpoint is corrupted by additive zero-mean isotropic Gaussian noise. To control the noise level, the standard deviation (Std) value of the noise is specified as pixels. As shown in Figure 7a, larger values will lead to more significant deviations from the original edge annotations. For each level, 100 independent trials were performed.
Figure 7b presents the statistical results of the proposed method under different levels of edge perturbation. As the perturbation level increases, our method exhibits a gradual degradation in estimation accuracy. At high perturbation levels ( pixels), the mean values of , , , and reach up to , mm, , and , respectively. The Std values also exhibit relatively high magnitudes, indicating increased estimation variability. This performance degradation can be attributed to the inherent uncertainty in point cloud data, which is further amplified by severe corruption of edge observations. Nevertheless, the proposed method demonstrates robust performance under moderate edge perturbations ( pixels). In this situation, the mean values of , , , and remain below , mm, , and , respectively, with relatively small Std values. These results indicate that the proposed method is tolerant to inaccurate edge observations and can be integrated into practical perception pipelines under reasonable noise conditions.
5.4.2. Robustness to Initialization Perturbations
To evaluate the robustness of the proposed iterative solver to initialization perturbations, we perform a quantitative sensitivity analysis. The edge alignment energy term depends on a back-projection plane computed from the cylinder parameters . Therefore, large deviations in the initial estimate may distort the geometry of , weakening the geometric consistency of the edge alignment constraints during iterative optimization.
To assess the solver under such conditions, we perturb the ground-truth cylinder parameters with zero-mean isotropic Gaussian noise to generate perturbed initial estimates. Specifically, we apply perturbations to the ground-truth direction , center point , and radius :
where the unit-length constraint is preserved because is an orthogonal matrix. Here, the parameters , , and control the perturbation magnitudes of direction, position, and relative radius, respectively. We consider three severity levels and evaluate four noise configurations (one low, two medium, and one high), as summarized in Table 1. For each configuration, we perform 100 independent trials. Figure 8a visualizes the perturbed initial estimates under different severity levels.
Figure 8b reports the quantitative results. The proposed method remains robust under low and medium perturbations, consistently refining the perturbed initial estimates with stable convergence. When the perturbation reaches a high level, the performance degrades, indicating that highly inaccurate initializations can compromise the informativeness of the edge alignment constraints. In practice, the RANSAC-based initialization in our pipeline provides sufficiently accurate initial estimates, so the solver typically converges to a consistent solution.
5.5. Comparison with Baseline Methods
5.5.1. Quantitative Comparison
Figure 9 presents quantitative comparison results obtained from repeated RGB-D captures. For each metric, the values of mean and Std are reported. Our method achieves mean errors below for , mm for , for , and for across all settings, while also exhibiting the smallest Std over repeated runs. In contrast, point-based baseline methods are sensitive to data acquisition conditions, particularly variations in the cylinder radius and tilt angle . RANSAC shows pronounced instability because it serves as the initialization step, whereas the proposed edge–point fusion strategy effectively refines the rough initial estimate and achieves accurate solutions despite inaccurate coarse initialization. However, the methods of Eberly [37] and Zhang et al. [15], both of which rely solely on point cloud data, show limited parameter estimation accuracy even when initialized with the same RANSAC-based estimate. In particular, reducing the cylinder radius leads to marked degradation in estimation accuracy. In addition, variations in the tilt angle further result in noticeable changes in both the mean errors and Std values, with the impact being especially evident for and at larger tilt angles . Because the cylinder centroid and length are recovered using a model-based approach, errors in the upstream parameter estimation tend to accumulate and become amplified, an effect that is more pronounced at higher tilt angles. As the curvature of the cylindrical surface increases or the tilt angle becomes larger, the employed RGB-D camera struggles to capture accurate point cloud information, which directly leads to a performance degradation in such point-based methods.
These results show that the proposed method remains reliable and robust under different cylinder radius and the tilt angle . Although the quality of point cloud data is affected for small-radius cylinders and large tilt angles, the integration of edge information effectively compensates for the limitations of point data, leading to improved estimation accuracy and robustness. In addition, the edge fusion weight value determined from a validation subset remains effective beyond the validation settings and can be generalized in the complete dataset.
5.5.2. Qualitative Comparison
A comparison between the estimated cylinder and the ground-truth reference model for representative cases is shown in Figure 10. These results show that RANSAC exhibits a noticeable deviation in orientation, whereas the methods proposed by Eberly [37] and Zhang et al. [15] achieve similar orientation estimation. Although all baseline methods demonstrate strong performance for the large-radius target ( mm), they tend to overestimate the radius and exhibit position drift in the case of the small-radius target ( mm). By contrast, our approach produces cylinder estimates with an alignment that is more consistent with the ground truth.
Figure 11 shows the visualization of projected edges. The projection that is closer to the annotation suggests that the estimated cylinder parameters better satisfy the projection relationship. As in the visualization of cylinder model comparison, the baseline methods show significant offsets from the annotated edges. By explicitly considering the edge information, our method achieves better alignment, further verifying the accuracy of the estimated parameters.
Figure 12 presents a qualitative comparison of the point cloud completion results. As an intermediate result, accurate parameter estimation will produce higher-quality point cloud completion. The baseline methods produce incomplete reconstructions with noticeable geometric distortions, primarily due to inaccurate parameter estimation. In contrast, the proposed method yields a geometrically consistent completion of the cylindrical point cloud. Owing to this geometric consistency, the cylinder length and centroid can be reliably recovered using the model-based approach, resulting in more accurate estimates.
5.6. Computation Efficiency Analysis
Table 2 presents the running time of each method, where we report the average and Std over all the datasets used in our experiments. Notably, RANSAC and Eberly’s method are implemented on the CPU, whereas the method of Zhang et al. [15] and our approach are implemented on the GPU.
RANSAC is the fastest method and achieves the shortest runtime of s, as it is used for coarse initialization and obtains an approximate solution through random sampling. By contrast, Eberly’s method is the slowest, requiring s and exhibiting a large Std of s. This is because the method is implemented on the CPU, and as such is sensitive to changes in the size of the input point clouds. Denser point clouds lead to higher computational cost, while variations in point cloud size result in less stable runtimes.
To ensure a fair comparison, we evaluate our method against the method of Zhang et al. [15] under the same hardware settings. Both methods benefit from CPU and GPU acceleration and achieve good runtimes. While the method of Zhang et al. [15] optimizes only the point-based term , it achieves an average runtime of s. In contrast, our method attains an average runtime of s. The s time overhead results from the extra computation introduced by the edge alignment constraints in the fusion strategy. Although this causes a moderate overhead, the computational efficiency remains high owing to the large gains in accuracy and robustness.
5.7. Application Demonstration on a Real-World Piping Environment
To illustrate the practical applicability of the proposed method under real-world sensing conditions, we present a field demonstration in a piping scenario. As shown in Figure 13, a tripod-mounted RGB-D camera is oriented to face a number of straight pipes for data acquisition. Scenario 1 (S1) consists of an outdoor wall-mounted pipe (S1-O1) with a radius of mm. Scenario 2 (S2) consists of an indoor ceiling-mounted piping system composed of four smaller pipes: S2-O1 has a radius of mm, while S2-O2 through S2-O4 each have a radius of mm. Since real-world piping environments make it difficult to establish reliable ground truth for complete cylinder parameters, this subsection focuses on an application-oriented demonstration rather than a strict quantitative evaluation. For pipes with known radii , we report radius estimation results as an indicative quantitative metric. To facilitate qualitative comparison under real sensing conditions, we also provide visualizations of edge reprojection and reconstructed piping models.
Figure 14 presents the quantitative results of radius estimation. Our method achieves lower mean values of and keeps smaller Std values across all tested pipes. This result shows higher accuracy and better robustness when estimating pipe radii in real-world conditions. In contrast, the three point-based baseline methods produce much larger errors in radius estimation. Figure 15 shows the differences between these methods through edge projections. Since our method uses edge information, the projected edges closely follow the annotated piping boundaries. This indicates that the central axis (i.e., the orientation and position) of the pipes is well-estimated. On the other hand, the baseline methods show clear drift in their edge projections. This drift indicates inaccurate parameter estimation in the central axis. As a result, the reconstructed piping models from the baseline methods do not match the true piping geometry. By comparison, our method produces geometrically consistent piping models. Notably, although the end-faces of the pipes are not fully visible, the finite extents are recovered to represent the observed pipe segments. This consistency is shown by the better alignment with the observed piping boundaries in the RGB images.
6. Discussion
This section discusses the proposed method, its limitations, and directions for future research.
6.1. Limitations
Although the proposed edge–point fusion method achieves consistent improvements under the tested conditions, real-world performance will depend on uncertainties introduced by upstream perception modules. In practical deployments, cylindrical regions and edge features are typically produced by automatic detection, segmentation, and edge extraction pipelines, which can be sensitive to scene factors such as lighting variation and background texture. To evaluate the fusion-based geometric fitting itself in a controlled setting, we use manual annotations in this study. This choice isolates the proposed formulation from detector-dependent errors, but also means that upstream failures (e.g., missing, biased, or spurious edges) are not explicitly modeled. Within an end-to-end perception pipeline, the fusion-based fitting component would be applied downstream of standard perception modules. While modeling detector-induced uncertainty and validating the full pipeline are beyond the scope of this geometry-centric study, they remain important directions for future work, as they require jointly considering both upstream perception and downstream fitting.
In our experiments, we use a fixed edge fusion weight that is selected on a validation subset and then applied to all evaluated configurations. Despite its effectiveness being validated under the tested conditions, a fixed weighting can be suboptimal across different RGB-D sensors and diverse scenes. For example, a fixed setting may fail to adapt to the changing reliability of the two modalities in low-texture scenes where edge confidence is reduced or in the presence of severe point cloud noise and outliers. Moreover, while we evaluate different radii and viewing angles, extreme conditions such as highly specular reflections or heavy occlusion were not explicitly modeled. These limitations motivate adaptive fusion strategies guided by confidence and noise characteristics as well as more explicit robustness to outliers.
Finally, the current work assumes ideal cylindrical geometry. As such, its direct applicability is limited for objects often encountered in real-world settings that deviate substantially from ideal cylinders, such as hoses, cables, or deformed pipes. More flexible shape parameterizations would be beneficial in extending the proposed method beyond ideal cylinders.
6.2. Future Perspectives
The results suggest that the proposed method improves robustness under challenging conditions such as small cylinder radii and large viewing angles, which commonly occur with consumer-grade RGB-D sensors. Given that this study focuses on fusion-based geometric fitting under controlled settings, several directions remain to improve practical deployment and broaden applicability. (1) A natural extension would be to integrate an automatic detector in order to realize a fully end-to-end system in which detection, edge extraction, and parameter estimation are performed within a single pipeline without manual annotation. To reduce reliance on manual annotation, future work could study self-supervised or weakly supervised schemes that leverage temporal or multi-view consistency constraints to support automatic annotation generation [38]. (2) An adaptive weighting strategy could be explored to dynamically balance the contributions of point and edge constraints according to sensor characteristics and scene conditions. For instance, the method could incorporate sensor noise models or edge confidence maps to adaptively reduce edge weight in low-texture environments, and the fusion weights could be adjusted according to point measurement noise and sampling density. (3) Beyond ideal cylindrical shapes, the current method can be extended to support approximately cylindrical objects. One possible direction is to generalize the central axis representation as a smooth curve with spatially varying radius. Under such a formulation, edge information may still provide useful geometric constraints to compensate for point cloud inaccuracies on curved or deformable surfaces, broadening applicability to real-world perception tasks. (4) In addition to detector integration, recent learning-based methods may complement the proposed geometric fitting method by providing candidate regions or initial hypotheses that reduce the downstream search space [39] and by providing image-conditioned priors for point cloud denoising and completion under partial or noisy observations [40]. Integrating such components with the proposed formulation is promising, but requires careful treatment of uncertainty propagation and systematic error sources.
7. Conclusions
This paper proposes an edge–point cloud fusion approach for estimating cylinder parameters. By leveraging edge features as an additional geometric source, the proposed method jointly optimizes the full set of cylinder parameters by fusing edge-derived constraints with point cloud information. The experiments show significant improvements in accuracy and robustness for the proposed method compared with point-based fitting approaches. Future work will focus on achieving fully automatic processing and extending the method to handle approximately cylindrical structures and more complex geometric shapes.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Kim D.M. Ahn J. Kim S.W. Lee J. Kim M. Han J. Real-time reconstruction of pipes using RGB-D cameras Comput. Animat. Virtual Worlds 202335 e 219710.1002/cav.2197 · doi ↗
- 2Moritani R. Kanai S. Date H. Watanabe M. Nakano T. Yamauchi Y. Cylinder-based simultaneous registration and model fitting of laser-scanned point clouds for accurate as-built modeling of piping system Comput.-Aided Des. Appl.20181572073310.1080/16864360.2018.1441239 · doi ↗
- 3Cao G. Automated detection of cylindrical structures in complex pipelines using iterative point cloud segmentation and high-precision fitting Sci. Rep.2025154553510.1038/s 41598-025-30323-841310248 PMC 12749517 · doi ↗ · pubmed ↗
- 4Li C. Chen P. Xu X. Wang X. Yin A. A coarse-to-fine method for estimating the axis pose based on 3D point clouds in robotic cylindrical shaft-in-hole assembly Sensors 202121406410.3390/s 2112406434204808 PMC 8231622 · doi ↗ · pubmed ↗
- 5Dong H. Zhou J. Qiu C. Prasad D.K. Chen I.M. Robotic manipulations of cylinders and ellipsoids by ellipse detection with domain randomization IEEE/ASME Trans. Mechatron.20232830231310.1109/TMECH.2022.3193895 · doi ↗
- 6Dong H. Zhou J. Yu H. Robotic grasps of cylindrical and cubic objects via real-time learning-based shape detection IEEE Trans. Automat. Sci. Eng.2024229681969710.1109/TASE.2024.3510592 · doi ↗
- 7Bolles R.C. Fischler M.A. A RANSAC-based approach to model fitting and its application to finding cylinders in range data Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI)ACM New York, NY, USA 1981637643
- 8Rabbani T. Van Den Heuvel F. Efficient hough transform for automatic detection of cylinders in point clouds ISPRS J. Photogramm. Remote Sens.200536065
