A Fiducial-Marker-Based Localization Method for Automotive Chassis Bolt Assembly
Xiangqian Peng, Yingjie Xiao, Zhewu Chen, Kaijie Chen, Hong Huang

TL;DR
This paper introduces a method using fiducial markers to accurately locate bolts during automotive assembly, improving reliability in challenging industrial conditions.
Contribution
A novel localization method using a concentric ring marker and optimized ellipse fitting for automotive bolt assembly.
Findings
The improved Retinex-based algorithm enhances ROI extraction under uneven lighting.
Optimized ellipse fitting increases accuracy and reliability of bolt localization.
The method achieves high precision, meeting industrial automatic assembly requirements.
Abstract
To address the difficulty of accurately localizing automotive chassis bolts during the assembly process—caused by non-uniform illumination, limited camera installation space, and occlusions from the vehicle body structure—a fiducial-marker-based localization method is proposed. In this method, a concentric ring-shaped fiducial marker is affixed to the bottom of the assembly wrench, and its region of interest (ROI) is extracted using an HSV color space segmentation algorithm. To overcome interference from uneven lighting and insufficient brightness in industrial environments, an improved Retinex-based image enhancement algorithm is introduced, which significantly improves the robustness and accuracy of ROI extraction. The extracted ROI image is subjected to ellipse fitting, and the fitting process is optimized by incorporating the Leitz criterion. Experimental results show that the…
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21
Figure 22
Figure 23- —Natural Science Foundation of Hunan Province
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Object Detection Techniques · Optical measurement and interference techniques · Engineering Structural Analysis Methods
1. Introduction
In the automotive manufacturing process, the chassis serves as a key load-bearing structure with a complex design, requiring extremely high connection strength and assembly precision. A large number of bolts with varying specifications and complex positional distributions are used for fixation. Traditional assembly methods rely on manual confirmation of the bolt type, installation position, and corresponding torque-angle parameters based on process documentation, with assembly operations completed using a manual torque wrench. This process is not only labor-intensive and inefficient but also highly susceptible to human error, making it difficult to ensure assembly quality. With the development of industrial automation, the integration of bolt localization systems based on 3D stereoscopic vision technology with assembly equipment provides a new solution to this issue.
In automotive chassis bolt localization systems, accurate bolt localization presents significant challenges due to factors such as non-uniform illumination, limited camera installation space, and occlusions from vehicle body structures. Furthermore, each chassis image contains multiple bolts, with each bolt occupying a small area, which complicates precise localization. To address this issue, this paper proposes a fiducial-marker-based localization method for automotive chassis bolt assembly. A concentric ring-shaped fiducial marker is affixed to the bottom of the wrench, and precise localization of the automotive chassis bolt is achieved through the accurate localization of the marker’s center. While there has been no research on using fiducial markers for chassis bolt localization under a vehicle, extensive studies have been conducted internationally on fiducial markers in other application scenarios. For instance, Michail Kalaitzakis et al. [1] comprehensively compared the performance of common square markers (ARTag, AprilTag, ArUco, and STag) under various conditions. Their study shows that while square markers such as AprilTag exhibit strong robustness in occlusion scenarios, their pose estimation accuracy may degrade under extreme observation angles or non-uniform lighting in constrained industrial spaces. Lilian Calvet et al. [2] proposed a novel fiducial marker system called CCTag (Concentric Circle Tag), designed to address the failure of existing markers, especially square markers like AprilTag and simple circular markers, under adverse conditions such as motion blur, large defocus, low resolution, and non-uniform lighting. The concentric circle structure in the CCTag, which becomes concentric ellipses under perspective projection, possesses excellent geometric invariance properties. Tomáš Pivoňka et al. [3] developed a robust and accurate robot docking system using monocular vision and AprilTags, along with automatic calibration based on SfM, highlighting the algorithm’s ability to replace cumbersome manual environment measurements and significantly reduce deployment and maintenance barriers. Filippo Bergamasco et al. [4] proposed a new fiducial marker system called Rune-tag, based on a dot-array concentric circle design, which offers high occlusion resistance, accuracy, and projective invariance. The concentric ring fiducial marker used in this paper inherits the geometric invariance advantages of CCTag and Rune-tag, but applying CCTag and Rune-tag directly to chassis bolt localization still faces specific limitations. For example, while they excel at handling noise, they do not address the issue of feature extraction failure due to non-uniform lighting on the chassis. Additionally, most circular-marker-based localization methods focus on detection robustness but often overlook geometric eccentricity errors, which are critical in high-precision localization tasks. To overcome these challenges, this paper introduces improvements to existing techniques. Huang et al. [5,6,7] proposed a low-light image enhancement algorithm based on the Retinex theory to address the issue of imprecise bolt localization due to uneven lighting and low-light conditions in automotive chassis images. This method separates the illumination and reflection components of low-light images and enhances image details under low-light conditions through adaptive gain adjustment and color correction. In this paper, Huang et al.’s method, combining an improved Retinex algorithm with HSV segmentation, stably extracts the fiducial marker even under harsh, uneven industrial lighting conditions. Yang et al. [8] proposed an adaptive sub-pixel edge detection algorithm for locating nut hole centers, which analyzes pixel intensity distribution features of hole edges under Gaussian-weighted adaptive thresholds, and achieves precise automatic center determination using least squares circle fitting in noisy environments. Additionally, Wang et al. [9] proposed an ellipse parameterized geometric fitting method to address the low fitting accuracy of bolts under partial occlusion. This method analyzes the coupling relationships between ellipse geometric parameters and the calculation of geometric distance errors, using an improved least squares optimization algorithm to achieve high-precision fitting of industrial component elliptical profiles under high noise and partial occlusion conditions. This paper introduces the Leitz criterion to optimize ellipse fitting, achieving better fitting quality than the standard least squares method used in previous works. Since perspective projection of spatial circles causes displacement of the circle center, thus reducing bolt localization accuracy, Cui et al. [10,11] proposed a high-precision localization algorithm for the center of circular targets in visual measurement. By analyzing the geometric causes of center displacement during ellipse fitting and introducing corresponding correction methods, iterative optimization and sub-pixel edge extraction were used to achieve high-precision localization of circular target centers under complex angles and lighting conditions. This paper calculates the normal vector of the spatial circle plane and, using vanishing lines and the ellipse parameter matrix, computes the actual projected center of the concentric ring fiducial marker in the image, compensating for the marker’s shortcomings in lighting adaptability and projection deviation correction.
This paper improves and integrates existing technologies, successfully overcoming the challenges of automotive chassis bolt localization caused by non-uniform illumination, limited camera installation space, and occlusions from the vehicle body structure. It achieves high-precision localization of automotive chassis bolts, meeting the practical precision requirements for bolt assembly in industrial applications.
2. Feature Extraction of Circular Targets
In the automatic assembly process of automotive chassis bolts, spatial localization of the bolt end is primarily achieved by acquiring the coordinates of the concentric circular fiducial marker affixed to the bottom of the assembly wrench. This process involves capturing images of the concentric circular fiducial from the bottom of the wrench using two cameras positioned on the left and right sides. The region of interest (ROI) corresponding to the fiducial marker is then extracted using an HSV color space segmentation algorithm. Subsequently, a series of image processing steps—including binarization, edge detection, contour linking, and ellipse fitting—are performed on the extracted ROI image to extract the contour features of the concentric circular fiducial marker.
2.1. HSV-Based Color Segmentation Algorithm for Target Extraction
In the current industrialized automotive production process, the chassis structure is typically large in volume. To ensure that the designed camera system can capture the entire chassis area, a wide field of view must be configured. As a result, the circular fiducial marker affixed to the bottom surface of the assembly wrench occupies only a small portion of the acquired image. Therefore, a preliminary localization of the fiducial marker is required to extract its ROI, which facilitates subsequent processing [12].
This study employs an HSV color segmentation algorithm to identify the purple region of the target, thus achieving the initial localization of the target. The HSV color space consists of three components: Hue, Saturation, and Value. Compared to the traditional RGB color space, the HSV color space aligns more closely with human visual perception, providing a more intuitive representation of the color tone, brightness, and saturation of an image. As a result, it offers distinct advantages in color-based segmentation tasks [13].
Since images displayed on electronic devices are typically encoded in the RGB format, and the RGB channels are not well-suited for representing specific color characteristics, it is often necessary to convert RGB components into HSV components for more effective color processing. The conversion relationships [14] are defined as shown in Equation (1).
By setting appropriate value ranges for the H, S, and V components, specific colors can be effectively identified, enabling the initial localization of the target. Based on the actual lighting conditions in the industrial environment, a well-illuminated target image with preserved details was captured. The image was processed using MATLAB (R2022b) to adjust the ranges for H, S, and V. When the range of H was , the range of was S, and the range of V was , only the purple region corresponding to the concentric ring target on the bottom plane of the assembly wrench was detected. Therefore, the value ranges for H, S, and V were determined for subsequent processing.
2.2. Image Enhancement Processing:
In industrial environments, non-uniform illumination frequently occurs on automotive chassis surfaces, particularly in edge regions where lighting intensity is severely insufficient. Additionally, due to the large size of the chassis structure, image noise interference is significant, making it difficult to preserve the detailed features of the fiducial marker in poorly lit areas. This adversely affects subsequent visual observation and feature extraction. Under such conditions, the previously defined H, S, and V value ranges are insufficient for completely extracting the purple region of the circular fiducial marker. To address this issue, a Retinex-based image enhancement algorithm incorporating bilateral filtering is applied to enhance the fiducial marker image, thereby improving the visibility and contrast of the target region [15]. The overall image enhancement process is illustrated in Figure 1.
2.2.1. Improved Retinex Algorithm
The Retinex model is based on the human visual system’s color constancy hypothesis [16], which is used to decompose an observed image into illumination and reflection components, thereby enhancing image details and contrast under non-uniform lighting conditions. The single-scale Retinex algorithm is simple and effective, but it performs poorly in enhancing image details. To better capture and enhance structures and details at various scales within the image, a multi-scale Retinex algorithm is proposed, with the following expression:
Here, N represents the total number of scales, and denotes the weight assigned to each scale. When N = 3, the weights are typically set as w1 = w1 = w3 = 1/3; when N = 1, the multi-scale Retinex algorithm reduces to the single-scale Retinex algorithm.
In the Retinex algorithm, when estimating the illumination component using a Gaussian kernel, the scale parameter is highly sensitive to the enhancement effect: a large leads to insufficient contrast, while a small σ tends to produce halos and local distortions, which is particularly evident in industrial scenes with strong reflections, occlusions, and noise interference [17]. To improve edge preservation and suppress halo effects, this paper uses an improved bilateral filter to replace the Gaussian kernel as the central surrounding function for estimating the illumination component [18], with the weight function expressed as follows:
In the equation, denotes the position of the image center, and represents the grayscale intensity of the pixel at the center. is the standard deviation of the spatial-domain Gaussian function, and is the standard deviation of the range-domain Gaussian function.
Substituting Equation (3) into Equation (2) yields the multi-scale Retinex algorithm based on bilateral filtering theory. The logarithmic transformation is then applied to , resulting in the reflection image .
2.2.2. A Saturation Enhancement Algorithm Based on Contrast Stretching
When the hue component (H) remains unchanged, the overall visual effect of the image is primarily determined by the brightness and saturation. To enhance brightness while preserving edge structure, this paper introduces bilateral filtering to the brightness channel (V), resulting in clearer edge transitions. However, under conditions of insufficient or uneven illumination, images often exhibit low color vibrancy and reduced saturation levels. Enhancing brightness and details in such low-light scenarios may introduce color distortions if saturation is not properly adjusted. To mitigate this, a linear contrast stretching strategy is employed to optimize the saturation component, ensuring that its values are distributed effectively within the full dynamic range of [0, 255]. The computation of contrast stretching is given by Equation (4).
In the equation, denotes the input saturation component, while L and H represent the lower and upper bounds of the saturation stretching range, respectively.
2.3. Procedures for Target Marker Feature Extraction
Under actual industrial lighting conditions, an initial fiducial marker image with uniform illumination and clear detail is first captured. MATLAB is used to perform color space processing on the image, and the HSV value ranges across the three channels are extracted as reference standards for subsequent feature extraction. The fiducial image to be processed is then subjected to color space conversion and channel separation; the RGB image is decomposed into three channels and transformed into the HSV space. To mitigate the effects of non-uniform illumination, an improved multi-scale Retinex algorithm is applied to enhance the brightness (V) channel, thereby improving both image detail and overall luminance. In addition, linear contrast stretching is performed on the saturation (S) channel to enhance the color expressiveness of the image and improve robustness under complex lighting conditions. Pixel-level analysis is conducted on the enhanced image: if a pixel’s HSV values fall within the predefined reference range, it is marked as white, resulting in a binarized image of the fiducial’s purple region. Finally, a minimum bounding rectangle is used to enclose all white regions in the binarized image, and the area covered by this rectangle is defined as the final region of interest (ROI) to be extracted.
To verify the effectiveness of the proposed method, four fiducial marker images captured under low and uneven lighting conditions beneath the automotive chassis were selected, as shown in Figure 2. Image enhancement was applied to each of the original images, and the enhanced results are presented in Figure 3. Subsequently, the purple regions were extracted from both the original and enhanced images. The binarized image of the purple region from the original image is shown in Figure 4, while that from the enhanced image is shown in Figure 5. The final extracted regions of interest (ROI) are illustrated in Figure 6.
As shown in the figures below, the image enhancement process significantly improves overall image quality. Figure 2 shows the original image, in which the target object is visually merged with the background due to poor lighting conditions. This results in low contrast and blurred edges, making it difficult to distinguish the target from the background. Consequently, the corresponding binarized result in Figure 4 contains substantial noise artifacts and incomplete representation of the target object, hindering effective extraction of the target region. In contrast, Figure 3 displays the enhanced image, where the target object appears clearly defined, with strong contrast against the background and well-preserved edge details. These improvements facilitate more accurate feature extraction, as evidenced by the binarized result shown in Figure 5, which exhibits clear contours, complete structure, and distinct separation between the foreground target and the background, allowing the region of interest (ROI) of the fiducial marker to be effectively extracted. These results confirm that the proposed improved multi-scale Retinex image enhancement algorithm effectively addresses the difficulties in feature extraction caused by uneven illumination, especially in the poorly lit edge areas of the automotive chassis. The method enables stable and complete extraction of the fiducial marker’s purple region.
2.4. Canny Edge Detection Applied to the Region of Interest
The region of interest (ROI) of the fiducial marker, extracted using the HSV color space segmentation algorithm, undergoes Canny edge detection. Compared to direct thresholding or simple gradient operators, Canny edge detection is more suitable for subsequent contour extraction and ellipse fitting, as it performs better in noise suppression, edge localization, and edge continuity. The algorithm process includes: Gaussian smoothing—gradient calculation—non-maximum suppression—double-threshold hysteresis. The image after Gaussian filtering of the ROI is shown in Figure 7.
In the Canny algorithm process, the selection of dual thresholds directly impacts the edge continuity and noise resistance of Canny edge detection. Traditional approaches rely on empirically set high and low thresholds, which are difficult to adapt to industrial scenes with non-uniform illumination. This paper introduces the Otsu algorithm to enable adaptive selection of high and low thresholds during edge linking, thereby effectively improving the stability and continuity of edge detection [19].
For an image of size M × M, with a grayscale range of , the pixels are divided into two classes, C0 and C1, using a threshold t. The probabilities and means of these classes are defined as shown in Equation (5).
Based on the above results, the expression for the between-class variance of the image is given by Equation (6):
Through iterative computation, the segmentation threshold corresponding to the maximum between-class variance is identified as the optimal threshold. At this point, the low threshold is denoted as , and the high threshold is defined by Equation (7).
The Canny algorithm described above is applied to perform edge detection on the region of interest shown in Figure 6. The detected adjacent edge points are subsequently connected to form contours, as illustrated in Figure 8.
As shown in Figure 8, the Canny edge detection algorithm employed in this study effectively extracts the boundaries of the circular fiducial marker. The detected edge contours are continuous and clear, exhibiting good structural integrity. Although minor discontinuities are observed near the edges in some images, the overall structural features of the target object are well preserved. These results indicate that the Canny edge detection algorithm used in this work provides high edge detection accuracy.
2.5. Ellipse Fitting Performed Using the Least Squares Method
In practice, the marker affixed to the bottom surface of the assembly wrench consists of a concentric circular ring structure. However, due to the angular deviation between the imaging plane of the camera and the bottom surface of the wrench, the projected circular marker appears elliptical in the captured image. Therefore, ellipse fitting is performed on the region of interest (ROI) of the circular marker using the least squares method.
The least squares method is an optimal estimation technique derived from the maximum likelihood estimation principle, under the assumption that the random errors of the sample measurements follow a normal distribution [20]. The fundamental constraint of this method is to minimize the sum of algebraic distances between all sample points and the fitted curve. Let be a point in the plane and the equation of the curve, then represents the algebraic distance from point M to the curve. By applying the least squares method to the set of discrete sample points, an objective function for ellipse fitting is constructed, as shown in Equation (8). The coefficients of the ellipse equation are determined by minimizing this objective function.
According to the extremum principle, in order to minimize , it is only necessary to satisfy the following condition:
A linear system of equations is thereby obtained, from which the coefficients A, B, C, D, E and F of the ellipse equation can be solved using the full pivoting Gaussian elimination method, in conjunction with the imposed constraint conditions.
Based on the geometric properties of ellipses, five geometric parameters of the ellipse can be calculated, including the center position , the lengths of the semi-major and semi-minor axes , and the orientation angle of the major axis . In a two-dimensional plane, any ellipse can be fully characterized by these parameters.
2.6. Ellipse Fitting Algorithm Optimized by the Leitz Criterion
According to Gaussian error theory, when measurement values follow a normal distribution, the probability of the residuals falling within three times the standard deviation, i.e., within the interval, is greater than 99.7%, while the probability of falling outside this interval is less than 0.3%. Therefore, measurement values whose residuals fall outside this region can be considered outliers [21]. This is the Leitz criterion for outlier detection, which is also commonly referred to as the method.
When performing least squares fitting on discrete sample points, the presence of outliers in the sample points can significantly affect the curve fitting results. Such points should be excluded from the curve to be fitted. In this paper, the Leitz criterion is used to remove outliers from the sample points.
Using the objective function defined in Equation (8), the fitted ellipse in the pixel coordinate system yields its center , major axis a, minor axis b, and orientation angle . Let N be the number of edge points used in the fitting. The coordinates of each edge point can be obtained using OpenCV functions. Denoting the coordinates of an edge point as , the algebraic pixel distance between this point and the ellipse center is then calculated accordingly:
Perform data analysis on the sample point set based on the Leitz criterion, removing the error points present in . According to the Leitz criterion, the arithmetic mean of the sample point set is first calculated:
Subsequently, the standard deviation is calculated:
The residuals of the data set are then calculated as follows:
According to the Leitz criterion, if or , then point is considered an outlier. Using the correlation function, this point is excluded from the curve to be fitted. The least squares method is then applied to perform ellipse fitting on the new curve to be fitted. This process is repeated until the residuals of fall within , indicating that there are no outliers in the data set. The final fitted ellipse is the optimal ellipse.
An error analysis was conducted on the estimated contour radius of the circular marker using computational algorithms. The designed circular marker consists of a white circle with a radius of 10 mm and a black ring with a radius of 15 mm. The detailed dimensions of the marker are shown in Figure 9. Under actual industrial conditions, 30 images of the circular marker were captured. Two different ellipse fitting algorithms were applied to each image to fit the circular marker. Using a binocular vision imaging model, the coordinates of all ellipse edge points and the ellipse center were computed in the camera coordinate system. For the white circle, the Euclidean distances from all fitted edge points to the ellipse center were calculated, and their average was taken as the estimated radius. The estimated radius of the white circle is shown in Figure 10. The same method was applied to estimate the radius of the black ring, with the result shown in Figure 11. The standard deviation, mean error, maximum error, and minimum error of the estimated radii were computed for both the white circle and black ring. The final error analysis results are summarized in Table 1.
As shown in Table 1, the ellipse fitting algorithm optimized using method results in significant reductions in the standard deviation, mean error, and peak error of the two estimated radii. This demonstrates that the algorithm effectively reduces ellipse fitting errors and substantially improves fitting accuracy.
The ellipse fitting algorithm optimized by the Leitz criterion using the least squares method is applied to the region of interest in Figure 6. The fitting result is shown in Figure 12. It can be observed that the algorithm accurately fits the contour of the circular marker and yields five geometric parameters: the ellipse center position , the lengths of the semi-major and semi-minor axes , and the orientation angle of the major axis . These parameters lay the foundation for the subsequent spatial positioning of the bolt.
3. Precise Localization of the Spatial Circle Center
During image acquisition of the circular fiducial marker, the principal optical axis of the camera is not necessarily perpendicular to the plane of the marker. As a result, the spatial circle appears as an ellipse in the image due to perspective projection. It is important to note that the center of the resulting ellipse does not correspond to the true projected position of the spatial circle center in the image; a deviation exists between the two [22]. Based on the method proposed in [23], an improved approach is adopted in this study. The ambiguity in the calculation of the normal vector of the circular plane caused by dual solutions is resolved using binocular vision. Furthermore, the actual projected center of the spatial circle is determined by incorporating vanishing lines and the ellipse parameter matrix.
3.1. Analysis of Spatial Circle Center Deviation
When the plane of a spatial circle is not parallel to the image plane of the camera, its projection onto the image plane appears as an ellipse. In practical applications, the geometric center of this ellipse is often mistakenly considered as the projected point of the spatial circle’s center. However, this assumption can introduce significant localization errors. As illustrated in the perspective projection diagram of the spatial circle in Figure 13, point denotes the optical center of the camera, and the camera coordinate system is established, with representing the camera’s optical axis. In the world coordinate system, a circle with radius r is defined, and its center is located at point C. After perspective projection through the camera, the circle appears as an ellipse on the image plane, with the ellipse center located at point , while the true projection of the circle center appears at point .
As shown in Figure 13, the projected point of the circle center does not coincide with the geometric center of the ellipse. If the geometric center of the ellipse is directly treated as the projected point of the circle center, it will inevitably introduce errors into the subsequent measurement and localization results [22].
3.2. Computation of the Normal Vector of the Spatial Circular Plane
To obtain the actual projected point of the spatial circle center, the normal vector of the spatial circular plane must first be calculated. Taking the white circle of the fiducial marker as an example, the elliptical contour features of the marker are extracted from the left and right images through image processing. The ellipse contours are then discretized to obtain the pixel coordinates of the contour points. Let point G represent the optical center of the camera. By traversing the ellipse contour points, points p and q are identified such that the angle reaches its maximum. A plane bGc is then constructed, which is perpendicular to the plane pGq and bisects the angle . The intersections of plane bGc with the ellipse contour yield points b and c, as illustrated in Figure 14. According to reference [23], the formula for calculating the normal vector of the spatial circular plane is given in Equation (14).
In the equation, and represent the two possible solutions for the normal vector of the spatial circular plane. Let f denote the focal length of the lens, and and can be calculated using Equation (15).
Using binocular vision, two possible solutions, and , for the normal vector can be obtained from the left camera image, while two possible solutions, and , can be derived from the right camera image. Among each pair, one vector corresponds to the correct normal direction. However, due to measurement errors, the two correct normal vectors are not exactly identical. The angle between the normal vectors obtained from the left and right images is calculated as shown in Equation (16).
The minimum value is selected from among , , and , The normal vectors and corresponding to , are determined as the correct normal vectors for the left and right images, respectively. In this way, the dual-solution problem of the spatial circular plane’s normal vector is effectively resolved.
3.3. Computation of the Spatial Circle Center Coordinates
In 3D space, parallel planes intersect at infinity, and their intersection line is projected onto the image as the vanishing line of the plane. Given the camera’s intrinsic parameter matrix M, obtained through camera calibration, the vanishing line L of the spatial circular plane in the camera coordinate system can be computed using the formula shown in Equation (17).
According to the fitted equation of the fiducial ellipse expressed in Equation (8), the parameter matrix w of the ellipse is defined as follows:
The actual projected point o of the spatial circle center on the image plane is thus determined, as shown in [23]:
The actual projected points and of the spatial circle center on the left and right camera images, respectively, are computed. Based on the binocular vision imaging model, the spatial coordinates of the circle center in the camera coordinate system are then calculated, thereby achieving precise localization of the spatial circle center.
3.4. Experimental Analysis of Precise Localization of the Spatial Circle Center
To verify the effectiveness of the proposed algorithm, experimental validation and analysis were conducted. Under real-world industrial conditions, 30 fiducial marker images captured at various positions within the binocular camera’s field of view were selected for testing. As shown in Figure 15, the images of the fiducial markers within the binocular field of view are presented, where the cross mark indicates the ideal projection point of the center of the white circle in the marker.
The projection coordinates of the circle center were calculated using the method proposed in this paper. Table 2 lists a subset of the ideal projection points, the ellipse-fitted centers, and the computed projection coordinates of the circle center. Table 3 presents the error analysis data, showing the distances from both the ellipse-fitted centers and the computed projection points to the corresponding ideal projection points.
As shown by the data in Table 2 and Table 3, the distance error between the ellipse-fitted center and the ideal projection point is relatively large, with an average error of 2.5048 pixels. In contrast, the average error of the center coordinates computed using the proposed algorithm is reduced to 0.1230 pixels. This demonstrates that the circle center coordinates obtained by the proposed method are significantly closer to the ideal projection point, indicating the effectiveness of the approach.
To further verify the effectiveness of the proposed algorithm, the binocular camera system was calibrated using the MATLAB toolbox. Through calibration, the intrinsic parameters of the binocular system, as well as the rotation and translation between the left and right cameras, were obtained. Based on the binocular stereo vision model [24], the 3D coordinates of the points were then computed. By calculating the Euclidean distances between the projected center points and comparing them with the theoretical lengths of a reference rod, the effectiveness of the proposed method was evaluated. A schematic diagram of the reference rod is shown in Figure 16, where the theoretical lengths of segments D1, D2, D3, and D4 are 882.546 mm, 882.753 mm, 882.473 mm, and 882.902 mm, respectively. The captured images of the reference rod are presented in Figure 17 with Figure 17a showing the left camera image and Figure 17b showing the right camera image.
A total of 10 reference rod images were captured within the field of view of the binocular camera system. The proposed algorithm was used to calculate the center pixel coordinates from both the left and right camera images. Table 4 lists a subset of the computed center pixel points and ellipse-fitted centers in the left camera images. DL represents the pixel coordinates of the left circle center corresponding to a given length, and DR represents the pixel coordinates of the right circle center. As shown in the table, there is a noticeable difference between the ellipse-fitted centers and the computed center pixel points.
The 3D coordinates of the points were calculated using both the ellipse-fitted centers and the center pixel points obtained through the proposed method. The Euclidean distances between the resulting 3D points were then computed and compared with the standard values of the reference rod. Due to the large volume of data and space limitations, only a subset of the Euclidean distance error analysis results is presented in Table 5, while Figure 18 illustrates a comparison of Euclidean distance errors for all data samples.
The average Euclidean distance error of the center pixel points computed by the proposed algorithm was 0.176 mm, which is significantly lower than the average error of 0.644 mm obtained from the ellipse-fitted centers. Furthermore, both the maximum and minimum Euclidean distance errors of the center pixel points calculated by the proposed method were smaller than those of the ellipse-fitted centers. These results validate the effectiveness and accuracy of the center pixel computation method presented in this paper.
4. Experimental Validation and Analysis
In automotive chassis bolt localization tasks, a typical approach uses a single pair of stereo cameras to localize bolts on the chassis, which limits the system’s field of view and application range. This paper proposes a multi-camera stereo vision system for automotive chassis bolt localization by deploying four pairs of stereo cameras symmetrically on both sides of the automotive assembly station. The goal is to increase the system’s field of view and adaptability. The study focuses on automotive chassis bolts, where a robotic arm’s traction mechanism guides the assembly wrench near the bolt to be assembled. The stereo vision system is then used to spatially localize the fiducial marker on the bottom of the wrench. Based on the known spatial geometric relationship between the marker and the bolt end, the position parameters of the bolt end in the vehicle coordinate system are derived. These parameters are then matched with the tightening process parameters from the database, and the matched tightening parameters are sent to the assembly wrench for assembly. The overall framework of the experimental system is designed based on the above requirements, as shown in Figure 19a.
Based on the overall system structure plan, a corresponding visual localization experimental platform has been built, and its overall layout is shown in Figure 19b. The experimental platform mainly includes: a robotic arm, stereo cameras, lighting system, lifter, and computer. For the visual perception part, a Hikvision industrial camera (model: MV-CS200-10UC) is used to form the stereo camera system, with a matching lens model of MVL-KF1224M-25MP and a lens focal length of 12 mm. The lighting system employs a white diffuse strip light source to address the issue of insufficient lighting on automotive chassis in industrial environments. During the experiment, a double-column electric hydraulic lift is used to support and adjust the height of the automotive chassis. The computing and control unit is a computer equipped with an Intel Core i9-9750H processor and a 64-bit Windows 10 operating system, which is used for image processing, algorithm execution, and system control.
In actual industrial conditions, the vehicle is fixed on the lift platform, and the chassis is raised to the effective working plane of the cameras through the lifting motion. Subsequently, the robotic arm pulls the assembly wrench to the target bolt position, and the stereo vision system is used to localize the automotive chassis bolt, obtaining the position parameters of the bolt end in the vehicle coordinate system. These parameters are then matched with the tightening process parameters from the database, and the matched tightening parameters are transmitted to the assembly wrench to complete the assembly. As shown in Figure 20, this is an image captured by the camera of the operator performing the automotive chassis bolt localization. It should be noted that when the wrench is perpendicular to the automotive chassis in the vehicle coordinate system, the center of the fiducial marker at the end of the assembly wrench has the same (X, Y) coordinates as the bolt end on the chassis. Using the algorithm presented in this paper, the actual projected center of the fiducial marker is calculated in the camera coordinate system. The coordinates of the marker center in the camera coordinate system are then transformed to the vehicle coordinate system using affine transformation [25]. This provides the (X, Y) coordinates of the marker center in the vehicle coordinate system, which corresponds to the (X, Y) coordinates of the bolt end in the vehicle coordinate system obtained through localization. Therefore, the (X, Y) coordinates of the bolt end obtained through localization are compared with the theoretical (X, Y) coordinates of the bolt hole to evaluate the system’s localization accuracy and verify the feasibility of the proposed visual bolt localization scheme.
4.1. Robustness Evaluation of Ellipse Fitting Under Disturbances
Strip light sources were positioned on both sides of the vehicle. When the vehicle was lifted to the working plane by the hoist, the illumination of the chassis was observed to be non-uniform due to the large structural size of the automotive chassis, exhibiting a gradual darkening trend from the center toward the edges. When the bolts to be localized were located in poorly illuminated regions, the corresponding fiducial marker images exhibited higher levels of noise. Therefore, the ellipse fitting algorithm proposed in this study must be capable of handling the increased noise present under low-light conditions. The following section evaluates the performance of the proposed algorithm under salt-and-pepper noise.
Salt-and-pepper noise is a type of random noise in which certain pixels in an image are randomly selected and their values replaced with either black or white. This type of noise is typically caused by communication errors during data transmission, where the values of some pixels may be lost, thereby degrading image quality [26]. In the experiment, 50 fiducial marker images were selected from various positions under the automotive chassis and at different angles relative to the camera. Within the region of interest (ROI) of the marker (measuring 120 pixels × 75 pixels), 200, 300, and 400 salt-and-pepper noise points were added, respectively, to evaluate the robustness of the proposed ellipse fitting algorithm. The experimental results under salt-and-pepper noise interference are shown in Figure 21.
Under salt-and-pepper noise interference, an error analysis was conducted on the proposed ellipse fitting algorithm. The standard deviation, mean error, maximum error, and minimum error of the estimated radii for the white circle and black ring were calculated. The error analysis results under salt-and-pepper noise are summarized in Table 6 Experimental results demonstrate that the proposed algorithm remains capable of accurately fitting the fiducial marker ellipses even in the presence of noise, maintaining high precision. These findings confirm that the visual localization method proposed in this study exhibits strong robustness and anti-interference capability under complex lighting conditions encountered in real industrial environments, thereby meeting the practical requirements for stability and accuracy in automated bolt assembly applications.
4.2. Analysis of Positioning Accuracy
Fifty bolts located at different positions and depths on the vehicle chassis were selected for positioning. The X and Y coordinates of the bolt endpoints obtained through the proposed localization method were compared with the theoretical X and Y coordinates of the corresponding bolt hole centers. The relative positioning errors were analyzed to determine whether the method satisfies the accuracy requirements of practical industrial assembly environments.
Figure 22 presents the line chart of relative errors along the X axis. The relative positioning error ranges from −3.642 mm to 3.545 mm, with a mean relative error of 1.884 mm and a variance of 1.236 mm^2^.
Figure 23 presents the line chart of relative errors along the Y axis. The relative positioning error ranges from −3.225 mm to 3.834 mm, with a mean relative error of 1.844 mm and a variance of 1.001 mm^2^.
As observed from the figure, the positioning errors at the bolt endpoints exhibit a random distribution. This variability can be attributed to the non-uniform illumination conditions under the vehicle chassis and the operational randomness introduced by manual handling during each bolt positioning process. Nevertheless, all positioning errors remain within 4 mm, which meets the accuracy requirements of industrial assembly processes and demonstrates the method’s applicability in real-world production environments.
5. Conclusions
(1)To address the challenge of accurately locating bolts during automotive chassis assembly caused by non-uniform illumination, limited camera installation space, and structural occlusion, a fiducial-marker-based localization method for bolt assembly was proposed. An image processing pipeline was designed to extract the fiducial features on the bottom surface of the assembly wrench, comprising HSV color space segmentation, binarization, edge detection, contour connection, and ellipse fitting. In the color segmentation stage, an improved Retinex-based image enhancement algorithm was integrated to significantly improve the robustness and accuracy of region of interest (ROI) extraction under challenging lighting conditions. For ellipse fitting within the ROI, the Leitz criterion was introduced to optimize the fitting process. Experimental results demonstrated that the optimized ellipse fitting algorithm achieved higher accuracy and substantially improved fitting quality.(2)Due to the perspective projection characteristics of spatial circles, a circular marker in 3D space is projected as an ellipse on the imaging plane of the camera, and the center of this ellipse does not correspond to the actual projection of the spatial circle’s center. A method for computing the true projection point of the spatial circle center on the image plane was investigated. Experimental results showed that the average error between the computed circle center and the ideal projection point was reduced from 2.5048 pixels to 0.1230 pixels using the proposed algorithm.(3)Experimental results demonstrated that the proposed localization method for automotive chassis bolt assembly exhibited high stability and effectively addressed the issue of non-uniform illumination under the chassis. The algorithm showed strong anti-interference capability, and the localization results were verified by comparing them with the theoretical coordinates of the bolt holes. The achieved positioning accuracy was within millimeter-level precision, enabling high-accuracy localization of chassis bolts. This satisfies the precision requirements of industrial assembly processes and demonstrates the method’s applicability in real-world manufacturing environments.(4)The localization method proposed in this paper has been validated in industrial environments and demonstrates high precision and robustness, meeting the localization requirements in the bolt assembly process. Additionally, the method has good scalability; it is not only applicable to automotive chassis bolt localization but can also be extended to other complex industrial assembly scenarios, such as part localization in electronic product assembly and component localization in aerospace assembly. Compared to learning-based algorithms, the method in this paper does not rely on large amounts of training data or complex models. Instead, it achieves precise localization through optimization strategies based on geometry and image processing, with relatively low computational cost. Furthermore, with the improvement of computational capabilities, this method is expected to integrate with deep learning technologies in the future, further enhancing its adaptability and intelligence, and expanding its application to localization tasks in more dynamic and complex environments.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Kalaitzakis M. Cain B. Carroll S. Ambrosi A. Whitehead C. Vitzilaios N. Fiducial markers for pose estimation: Overview, applications and experimental comparison of the AR Tag, April Tag, Ar Uco and S Tag markers J. Intell. Robot. Syst.20211017110.1007/s 10846-020-01307-9 · doi ↗
- 2Calvet L. Gurdjos P. Griwodz C. Gasparini S. Detection and accurate localization of circular fiducials under highly challenging conditions Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition IEEE Piscataway, NJ, USA 2016562570
- 3Pivoňka T. Sell R. Pikner H. Přeučil L. Fiducial marker-based monocular localization for autonomous docking IFAC-Papers On Line 2023562957296210.1016/j.ifacol.2023.10.1419 · doi ↗
- 4Bergamasco F. Albarelli A. Rodola E. Torsello A. Rune-tag: A high accuracy fiducial marker with strong occlusion resilience Proceedings of the CVPR 2011 IEEE Piscataway, NJ, USA 2011113120
- 5Lv X. Sun Y. Zhang J. Jiang F. Zhang S. Low-light image enhancement via deep Retinex decomposition and bilateral learning Signal Process. Image Commun.20219911646610.1016/j.image.2021.116466 · doi ↗
- 6Kim S.-W. Jung J.-Y. Yoo C.-H. Ko S.-J. Retinex-based illumination normalization using class-based illumination subspace for robust face recognition Signal Process.201612034835810.1016/j.sigpro.2015.09.028 · doi ↗
- 7Kim P. Coltin B. Alexandrov O. Kim H.J. Robust visual localization in changing lighting conditions Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA)Singapore 29 May–3 June 2017 IEEE Piscataway, NJ, USA 201754475452
- 8Yang R. Guo H. Chen Z. Sun J. Adaptive Subpixel Edge Detection for Locating the Center of Nut Screw Hole Int. J. Precis. Eng. Manuf.2021221357136410.1007/s 12541-021-00544-8 · doi ↗
