Fast Three-Dimensional Profilometry with Large Depth of Field
Wei Zhang, Jiongguang Zhu, Yu Han, Manru Zhang, Jiangbo Li

TL;DR
This paper introduces a new method for fast 3D imaging that uses a neural network to improve speed and depth of field.
Contribution
The novel contribution is a time-domain Gaussian fitting method combined with a neural network for rapid 3D profilometry.
Findings
The proposed method extends the system's depth of field by five times.
Data acquisition and computing times are reduced to under 35 ms.
The method works well on complex surfaces without deformation.
Abstract
By applying a high projection rate, the binary defocusing technique can dramatically increase 3D imaging speed. However, existing methods are sensitive to the varied defocusing degree, and have limited depth of field (DoF). To this end, a time–domain Gaussian fitting method is proposed in this paper. The concept of a time–domain Gaussian curve is firstly put forward, and the procedure of determining projector coordinates with a time–domain Gaussian curve is illustrated in detail. The neural network technique is applied to rapidly compute peak positions of time-domain Gaussian curves. Relying on the computing power of the neural network, the proposed method can reduce the computing time greatly. The binary defocusing technique can be combined with the neural network, and fast 3D profilometry with a large depth of field is achieved. Moreover, because the time–domain Gaussian curve is…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11- —National key research and development program of china
- —University key research project of Anhui province
- —Guangdong basic and applied basic research foundation
- —Outstanding Scientist Cultivation Project of Beijing Academy of Agriculture and Forestry Sciences
- —Excellent Research and Innovation Team of Universities at Anhui Province
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptical measurement and interference techniques · Image and Object Detection Techniques · Remote Sensing and LiDAR Applications
1. Introduction
Quickly and accurately acquiring a 3D point cloud of an object’s surface is important in numerous fields, such as quality control, robotic assembly, medical treatment, virtual reality, and reverse engineering [1,2,3]. With the advantages of non-contact, high speed, and high accuracy, fringe projection profilometry (FPP) has become one of the most promising 3D imaging techniques. In conventional FPP system, a set of 8-bit sinusoidal patterns will be projected onto the object surface. As an 8-bit gray pattern has limited projection rate (within 120 Hz), the measurement speed of FPP system is thus restricted [4].
By applying 1-bit binary patterns which have much higher projection rates (up to 20 kHz), the binary defocusing technique can greatly improve 3D imaging speed [5,6,7]. The squared binary defocusing method (SBM) is the simplest binarization strategy, which utilizes the binary patterns with the shape of square wave to create sinusoidal fringes [8]. Conventional binary defocusing techniques require a proper defocusing degree to achieve ideal sinusoidal fringes, otherwise significant measurement errors may arise. It thus is sensitive to the defocusing degree and has a small DoF. Various advanced binarization strategies have been proposed to enhance DoF, such as the sinusoidal pulse width modulation (SPWM) [9], optimal pulse width modulation (OPWM) [10], and the dithering method [11]. As these methods still require a proper defocusing degree for generating sinusoidal fringes, their enhancements in DoF are rather limited.
Many methods were introduced to minimize the measurement errors which are caused by using improper defocusing degrees. After projecting the 8-bit gray patterns and binary patterns to a white board, respectively, Xu et al. obtained the phase error distribution in a large depth range and then built a mathematical model to eliminate the phase error at arbitrary depth ranges [12]. Hu et al. utilized the depth-discrete Fourier series fitting to reduce the complexity of the phase error model [13]. In Zhu’s model, more influence factors were taken into account (including defocusing level, intensity noise, and fringe frequency), and the optimal fringe frequency of the binary error-diffusion fringe pattern can be selected [5]. Yu et al. achieved an accurate 3D reconstruction in a large DoF by directly transforming the captured patterns into the desired phase with deep learning models [14]. Although these methods may work well in error compensation, however, it is a tedious process to collect accurate phase errors in a large DoF, and expensive equipment is also required.
In this paper, a time-domain Gaussian fitting method is proposed to suppress sensitivity of defocusing degree. Different from the phase-shifting algorithm, projector coordinates can be achieved by projecting Gaussian fringes and determining the peak positions of time-domain Gaussian curves. The neural network technique is applied to rapidly compute peak positions of time-domain Gaussian curves. Finally, by generating Gaussian fringes with defocused binary patterns, the time-domain Gaussian fitting method can be combined with the binary defocusing technique. The high projection rate can be then applied in FPP with much lower sensitivity of defocusing degree, which helps to achieve fast three-dimensional profilometry with a large DoF.
The centerline extraction technique is also adopt in line structured light [15] and multi-line shift profilometry [16], and lots of algorithms have been proposed, such as the Steger algorithm [17] and Skeleton extraction method [18]. However, these algorithms work with the spatial distribution of Gaussian fringes, which are modulated by the object’s surface and may deform accordingly. This will cause difficulty in obtaining accurate measurements of complex surfaces. Comparatively, as the time-domain Gaussian curve is extracted from an individual image pixel, its shape will not deform according to the complex surface. This is beneficial to acquire accurate measuring results of complex surfaces.
Compared with the traditional strategy, which imitates sinusoidal stripes with a proper defocusing degree, Gaussian stripes can be easily generated with a simple binary pattern. Different from extracting phase information with sinusoidal stripes, the peak positions of Gaussian stripes are the key information for 3D scanning. Although the varied defocusing degree may lead to a variation in the blur radius of Gaussian stripes, the peak positions of Gaussian fringes, however, will keep fixed. The varied defocusing degree thus may have much less of an impact on the proposed method. Although the neural network technique can be use to reduce the computing time, the calculation process of extracting peak positions of time-domain Gaussian curves is indeed more complex than that of the calculating phase with sinusoidal stripes.
The rest of this paper is organized as follows. The principle of time-domain Gaussian fitting method is explained in Section 2. The neural network-based rapid calculation approach is stated in detail in Section 3. Sensitivities to defocusing degree and complex surface are analyzed, respectively, in Section 4. The performance of the proposed method is verified in Section 5, and its characters are summarized in Section 6.
2. Principle
2.1. Determining the Projector Coordinate with a Time-Domain Gaussian Curve
The multi-line binary patterns (P1, P2, ···, Pn) with uniform intervals are designed to generate the Gaussian fringes. In the multi-line binary patterns, the lines will gradually shift to a specific distance (dv) along the projector axis V. The interval between the two lines is equal to the product of the distance of the shifting step (dv) and the number of shifting steps (n). By defocusing the binary multi-line patterns, evenly spaced Gaussian fringes can be created to illuminate the objects. As shown in Figure 1, when the multi-line binary patterns are sequentially projected by the digital projector, the generated Gaussian fringes also shift with the constant speed, and the images of the Gaussian fringes (I1, I2, ···, I_n_) can be captured simultaneously. With respect to the image coordinate (x, y), intensity sequences Ii (x, y) are the uniform sampling of the Gaussian fringe, and can form a time–domain Gaussian curve.
Because the distance of the shifting steps of multi-line patterns are identical and go along the projector axis V, the projector coordinate, v, can be set as the horizontal axis of the time–domain Gaussian curve (see Figure 1). Suppose that a line in the projector pattern has shifted the distance of Δv, the Gaussian fringe will move simultaneously, and the intensity of the time–domain Gaussian curve just reaches the highest value (the peak of the time–domain Gaussian curve). By this time, the projector coordinate of the line is corresponds to the image coordinate (x, y). There, the shifting distance, Δv, can be seen as the relative projector coordinate of the line. While the initial coordinate of the line is set to zero, its relative projector coordinate is equal to the peak position of the time–domain Gaussian curve v_p_ (Δv = v_p_). Therefore, the projector coordinate corresponding to image coordinate (x, y) can be determined by finding the peak position of the time–domain Gaussian curve.
In this paper, the time–domain Gaussian curve is modeled as a one-dimensional Gaussian function:
where λ represents the bias, η denotes the scale factor, σ denotes the variance, and v_p_ is the peak position of the time–domain Gaussian curve.
Peak position, v_p_, can be determined by finding the optimal value of following objective function with the Levenberg–Marquardt algorithm [19]:
where n represents the number of captured images, v_i_ denotes the abscissa values of the time–domain Gaussian curve, and x and y indicate the coordinates in both directions on the image plane. Because Equation (1) contains four undetermined parameters (λ, η, σ, and v_p_), at least four elements should be included in the time–domain Gaussian curve to yield a reliable result. This means that the number of captured images should be no less than 4 ( ).
In the proposed method, as the binary lines are evenly spaced in the projector pattern, the maximum value of the relative projector coordinate, Δv, will be restricted by the distance between two adjacent lines. Just like the wrapped phase map in the phase-shifting method [4], the relative projector coordinates Δv (x and y) also can be converted into the absolute projector coordinate v (x and y) using the phase-unwrapping method [20]:
where D is the distance between two adjacent lines in projector pattern, and C (x and y) represents the coded values for phase unwrapping.
2.2. Polynomial 3D Reconstruction Model
In an FPP system, a 3D reconstruction model is required to convert the distribution of projector coordinates into 3D coordinates. Among the existing 3D reconstruction models, the polynomial reconstruction model is more flexible to take nonlinear factors (such as lens distortion in the camera and projector) into account [21]. Although the polynomial reconstruction model with higher order is more accurate, it is prone to be ill conditioned if the order is higher than three [21]. Therefore, a third-order polynomial model is employed in this work, which can be formulated as the following:
where, X, Y, and Z denote the 3D coordinate vectors, and (a1, a2, ···, a20), (b1, b2, ···, b20), and (c1, c2, ···, c20) represent the coefficients of the polynomial model.
In general, the coefficients of the polynomial 3D reconstruction model can be calibrated with the least-squares algorithm [22]. The calibration data can be obtained by using the planar target and Zhang’s method [23].
3. Rapid Calculation Method
Since the calculation process of the Levenberg–Marquardt algorithm involves iterative optimization, it may yield accurate peak positions, as well as causes low computational efficiency. To address this issue, a neural network-based approach is proposed to rapidly extract peak positions of time–domain Gaussian curves. The basic principle of this neural network-based approach is shown in Figure 2.
The proposed neural network consists of an input layer, an output layer, and a hidden layer. The intensity sequence, Ii (x, y) ( ), is taken as the input of the neural network. The number of neurons in input layer is n, and the output of this layer is (α1, α2, ···, α_n_). The hidden layer contains q neurons, and yields the result (β1, β2, ···, β_q_). The output layer finally exports the peak position v_p_ of the time–domain Gaussian curve. The weight matrix from the input layer to the hidden layer is W**h, and W**o represents the weight matrix from the hidden layer to the output layer.
Actually, most time–domain Gaussian curves are the sampling results of two adjacent Gaussian fringes. They cause the cyclic shift in the time–domain Gaussian curves, as shown in Figure 3. For this reason, while the time–domain Gaussian curve shifts continuously, the values of peak positions, however, have mutations in the edge region. This discontinuous correspondence would lead to a difficultly in computing accurate peak positions with the neural network.
Therefore, before taking it to be the input data of the neural network, the time–domain Gaussian curve should be preprocessed with additional circular shifting (see Figure 3). The shifting distance ds (x, y) can be approximately estimated by subtracting the position of the maximum value of the time–domain Gaussian curve v_max_ (x, y) from the middle position, .
With additional circular shifting, the peak position of the time–domain Gaussian curve will be changed to the middle area (see Figure 3). The discontinuous correspondence in the edge region can be avoided. The practical process of computing peak positions with the neural network is shown in Figure 4. Since the neural network merely yields the peak positions of circularly shifted Gaussian curves , the actual peak positions, v_p_, can be achieved by adding the shifting distance, ds ( ).
In order to determine the parameters of the neural network, the training data can be obtained using the Levenberg–Marquardt algorithm. While applying this algorithm, initial values may significantly influence computing efficiency. It is recommended that the minimum value v_min_, the maximum value v_max_, and the middle position v_mid_ of the circularly shifted time–domain Gaussian curve can be applied as the initial values of λ, η, and v_p_ in Equation (1).
4. Characteristics Analysis
4.1. Formatting of Mathematical Components
The Gaussian fringes are generated by defocusing the binary multi-line patterns, and the process of optical defocusing blur can be described as the following:
where M (u, v) and F (u, v) represent the multi-line pattern and defocused patterns, respectively, h is the defocusing PSF, ⨂ denotes the convolution operator, u and v indicate the coordinates in both directions on the defocused pattern (also the projector plane), and s and t present the coordinates in both directions on the multi-line pattern.
For convenience, the analysis process is carried out in one-dimensional space (projector axis V). In the first process, the defocusing of PSF h can be modeled as a one-dimensional Gaussian function:
where σ_h_ denotes the blur radius which is related to the defocusing degree.
In one-dimensional space (projector axis V), the multi-line pattern can be described as a set of Dirac delta functions (as shown in Figure 5):
where m is the number of lines in projector pattern, and v_d_ represents the distance between two adjacent lines.
After the one-dimensional convolution operation, the defocused pattern in one-dimensional space F (v) can be described as the following:
As illustrated in Equation (9) and Figure 5, the variation in blur radius, σ_h_ (corresponding to the varied defocusing degree), does not change the peak positions of the Gaussian fringes (in defocused pattern) as well as the time–domain Gaussian curves. It means that a varied defocusing degree theoretically has little impact on the proposed method, which achieves the projector coordinates by finding the peak position of the time–domain Gaussian curves.
In spite of this, the calculation accuracy of peak positions may be influenced by the distance between two adjacent lines (v_d_). When the distance is too small, there exists an overlap between the adjacent fringes, which may lead to low contrast and high noise in the captured images. This will reduce the calculation accuracy of the proposed method. Moreover, it also makes it difficult to accurately unwrap the relative projector coordinates (peak positions).
Comparatively, the phase-shift algorithm [24] determines projector coordinates by calculating the phase of sinusoidal fringes. When they are generated by defocusing binary patterns, ideal sinusoidal fringes only can be achieved with a specific defocusing degree. As the defocusing degree varies in the whole DoF, ideal sinusoidal fringes thus exist in a small range in DoF. In another range in DoF, a nonsinusoidal fringe can be observed and taken to be a combination of an ideal sinusoidal fringe and high-order harmonics [25].
where ( ) is the image of sinusoidal fringe, and N represents shifting number, ( ) are constants, is the phase, and is the phase-shifting amount.
With the high-order harmonics, the computed phase deviates from the ideal phase value. The phase error can be expressed as the following:
As shown in Equation (11), a varied defocusing degree will lead to a periodic phase error in the results of the phase-shift algorithm, which finally can reduce the accuracy of the FPP system.
4.2. Sensitivity to Complex Surface
While the measuring object has a complex surface, the projected Gaussian fringes will be modulated by the surface and become severely deformed, as shown in Figure 6a,b. In line-structured light [15] or multi-line shift profilometry [16], the 3D point cloud is achieved by finding the peak positions of the spatial distribution of the Gaussian fringes. The severely deformed Gaussian fringes will make it extremely difficult to obtain accurate results of the complex surface.
As illustrated in Figure 7, with respect to the complex surface, the camera pixels will receive the light emitted from the changed positions in the projector pattern. It will not deform the time–domain Gaussian curve, but will just cause extra shifting distance vs. (x, y) that seen in the time–domain Gaussian curve, G_s_, which can be expressed as the following:
where is the peak position of the time–domain Gaussian curve, G_s_.
It is shown in Figure 6c,d that, despite the projected Gaussian fringes being severely deformed on the complex surface, the extracted time–domain Gaussian curves still have an ideal shape. This characteristic of the time–domain Gaussian curves is helpful to compute accurate peak positions. Therefore, the proposed method is suitable to measuring a complex surface.
5. Experiments
Experiments have been carried out to verify the performance of our proposed method. A homemade FPP system, which consists of a DLP projector (LightCrafter 4500, Wintech, Beijing, China) and a CCD camera (MER-050-560U3M, Daheng, Beijng, China) with 8 mm lens (Computar, M0814-MP2, CBC Corporation, Tokyo, Japan), is applied to implement experiments. The captured images are processed using the MATLAB software (2012a). Two plaster statues (with the height of about 150 mm) and several planar targets are taken as the experimental subjects. The complementary gray-code unwrapping method [20] is applied in this paper to achieve the absolute projector column coordinates. And the calibrated third-order polynomial model [21] is used then to convert the absolute projector column coordinates into the height values.
In the first experiment, the performance of the proposed method is tested with the minimum shifting step (n = 4) and the minimum distance of shifting step (one column in projector plane, dv = 1). The distance between the two adjacent lines is four columns in the projector plane. The projector coordinates are computed with the Levenberg–Marquardt algorithm and Equation (1). During the experiment, four multi-line patterns are sequentially projected onto a plaster statue, and fringe images are captured simultaneously (see Figure 8a–d). It can be seen from the 3D reconstruction result (Figure 8e) that the proposed method can acquire a crowded and smooth point cloud of a complex surface, which proves that this method is suitable for measuring complex surfaces.
Although accurate projector coordinates can be achieved by using the Levenberg–Marquardt algorithm, it has a low calculation efficiency. In this experiment, 587 s are required to compute projector coordinates. The low calculation efficiency may result in the inability to make timely use of rapidly acquired 3D point cloud data.
By contrast, a neural network can rapidly yield projector coordinates. In our work, the numbers of neurons in the input layer and hidden layer are four (n = 4) and six (q = 6), respectively. The activation function of Tansig is applied in the input layer, output layer, and hidden layer. The plaster statues are placed in different depths, and are sequentially illuminated by four multi-line patterns with a larger distance of shifting step (two columns in the projector plane, dv = 2). The distance between the two adjacent lines is eight columns in the projector plane. The input part (time–domain Gaussian curves) of training data can be extracted from the simultaneously captured fringe images (see Figure 9a). The results of the Levenberg–Marquardt algorithm are computed with eight multi-line patterns (the shifting steps are eight in number and the distance of a shifting step is one column in the projector) and set as the output part of the training data, as shown in Figure 9b.
It is demonstrated from Figure 9d,e,g that, while training data are preprocessed without circular shift, the trained neural network tends to smooth the mutation of peak positions in edge region and thus yields inaccurate results. Comparatively, these inaccurate peak positions in the edge region can be effectively avoided by adding circular shift in the preprocessing procedures (see Figure 9c).
As shown in Figure 9h, when the step distance becomes larger (two columns in the projector plane), the periodic error also can be found in the result of the Levenberg–Marquardt algorithm (with four shifting steps). In comparison, the periodic error can be greatly reduced in the result of the trained neural network (Figure 9f). Most importantly, by using the neural network technique, the computing time can be decreased significantly (from 587s to 11 ms), which may meet the requirements for real-time measurement or detection.
Finally, the sensitivity to defocusing degree is tested, with several planar targets which are evenly placed from 0 mm to 750 mm (the interval is about 150 mm), as shown in Figure 10. For comparison, the sinusoidal patterns and the imitated sinusoidal patterns, which are generated using SBM technique and dithering technique, respectively, are applied in this experiment. The identical shifting step (four shifting steps) is applied, and the same fringe interval is used to generate a sinusoidal pattern, imitated sinusoidal pattern (SBM), and multi-line pattern in our proposed method (eight columns in the projector plane). A bigger fringe interval (16 columns in the projector plane) is used in the dithering technique. Due to the large depth between the planar targets (750 mm), the blur radius of the Gaussian fringes is also remarkably varied, from 1.61 (σ1 = 1.61) to 1.02 (σ6 = 1.02) (see Figure 10d).
The 3D reconstruction result of phase-shifting algorithm with 16 shifting steps is achieved and taken as the reference to calculate the 3D reconstruction errors of the different methods. The mean absolute errors (the average absolute value of the 3D reconstruction errors) are computed for comparison between the 3D reconstruction errors. With the varied defocusing degrees, the 3D reconstruction error of the phase-shifting algorithm with a sinusoidal pattern stays at a low level (Figure 11a,e). In comparison, the periodic errors in the reconstructed results of the SBM technique and the dithering technique increase rapidly (Figure 11b,c,e). It should be noted that the much larger error in the dithering technique just means that a greater defocusing degree is required by this technique. Comparatively, the proposed method shows much lower sensitivity to the varied defocusing degree (as shown in Figure 11d,e), and the periodic error can be suppressed without sacrificing acquisition speed. It is obvious that, with the same shifting steps, our proposed method has much greater DoF (about 750 mm) than that of the SBM technique (300 mm). The mean absolute errors are summarized in Table 1.
In this experiment, thirteen projector patterns (four multi-line patterns and nine gray-code patterns) are projected to achieve a 3D reconstruction result of the proposed method. Corresponding images are captured with a framerate of 400 Hz. The acquisition time of the 3D scan is 32.5 ms. By calculating the projector coordinates with the neural network model and by converting it into 3D coordinates with the polynomial reconstruction model, the computing time of the proposed method is squeezed into 35 ms (including 9 ms for computing projector coordinates, 11 ms for coordinate unwrapping, and 15 ms for calculating 3D coordinates). By contrast, it is shorter than that of the 3D scanning technique using the phase-shifting algorithm (358 ms).
With respect to the measurement accuracy of the 3D scanning technique, the reflectivity of the object surface is also an important influencing factor. Actually, the non-uniform reflectivity will lead to obvious errors in the computed peak positions. Its generation mechanism and compensation method need to be further studied.
6. Conclusions
In this paper, a time-domain Gaussian fitting method is proposed to achieve a fast scanning speed and large DoF. The principle of determining projector coordinates with time-domain Gaussian curves is firstly put forward. By computing projector coordinates with a neural network, the proposed method has much lower sensitivity to the varied defocusing degree. The DoF of 3D scanning can be extended from 150 mm to 750 mm. Moreover, our proposed method not only can achieve a high speed projection of Gaussian fringes, but the computing time also can be reduced dramatically from 587 s to 11 ms. With these advantages, our proposed method can be used for measuring large-scale parts in real-time.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Zhong J.F. Liu D.M. Chi S.J. Tu Z. Zhong S.C. Vision-based fringe projection measurement system for radial vibration monitoring of rotating shafts Mech. Syst. Signal Process.202218110946710.1016/j.ymssp.2022.109467 · doi ↗
- 2Juarez-Salazar R. Rodriguez-Reveles G.A. Esquivel-Hernandez S. Diaz-Ramirez V.H. Three-dimensional spatial point computation in fringe projection profilometry Opt. Laser Eng.202316410748210.1016/j.optlaseng.2023.107482 · doi ↗
- 3Xu S.Y. Feng T.Y. Xing F.F. Three-dimensional measurement method for high dynamic range surfaces based on adaptive fringe projection IEEE Trans. Instrum. Meas.202372501301110.1109/TIM.2023.3269111 · doi ↗
- 4Zheng R.H. Wan M.S. Zhang W. Yu L.D. Fast and accurate 3D topography measurement based on a novel synthesis pattern method Meas. Sci. Technol.20233404590510.1088/1361-6501/aca 819 · doi ↗
- 5Zhu J.P. Feng X.Y. Zhu C.H. Zhou P. Optimal frequency selection for accuracy improvement in binary defocusing fringe projection profilometry Appl. Opt.2022616897690410.1364/AO.46450636255771 · doi ↗ · pubmed ↗
- 6Zhu S.J. Cao Y.P. Zhang Q.C. Wang Y.J. High-efficiency and robust binary fringe optimization for superfast 3D shape measurement Opt. Express 202230355393555310.1364/OE.47264236258503 · doi ↗ · pubmed ↗
- 7Zheng Z.J. Gao J. Zhuang Y.Z. Zhang L.Y. Chen X. High dynamic defocus response method for binary defocusing fringe projection profilometry Opt. Lett.2021463749375210.1364/OL.43215134329272 · doi ↗ · pubmed ↗
- 8Lei Y. Zhang S. Flexible 3-D shape measurement using projector defocusing Opt. Lett.2009343080308210.1364/OL.34.00308019838232 · doi ↗ · pubmed ↗
