A Time-of-Flight Imaging System Based on Resonant Photoelastic Modulation
Okan Atalar, Rapha\"el Van Laer, Christopher J. Sarabalis, Amir H., Safavi-Naeini, Amin Arbabian

TL;DR
This paper introduces a novel time-of-flight imaging system utilizing a resonant photoelastic modulation device to measure distances and velocities in a scene with high accuracy, based on a new free-space optical mixer design.
Contribution
The paper presents a new free-space optical mixer device using photoelastic modulation for ToF imaging, enabling high-precision distance and velocity measurements.
Findings
Designed and fabricated a photoelastic modulator-based optical mixer.
Demonstrated the system's ability to downconvert megahertz modulation frequencies.
Proposed extension for high-accuracy phase and Doppler shift measurements.
Abstract
A time-of-flight (ToF) imaging system is proposed and its working principle demonstrated. To realize this system, a new device, a free-space optical mixer, is designed and fabricated. A scene is illuminated (flashed) with a megahertz level amplitude modulated light source and the reflected light from the scene is collected by a receiver. The receiver consists of the free-space optical mixer, comprising a photoelastic modulator sandwiched between polarizers, placed in front of a standard CMOS image sensor. This free-space optical mixer downconverts the megahertz level amplitude modulation frequencies into the temporal bandwidth of the image sensor. A full scale extension of the demonstrated system will be able to measure phases and Doppler shifts for the beat tones and use signal processing techniques to estimate the distance and velocity of each point in the illuminated scene with high…
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 1
Figure 2
Figure 3
Figure 8
Figure 9
Figure 10Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
A Time-of-Flight Imaging System Based on Resonant Photoelastic Modulation
Okan Atalar
Raphaël Van Laer
Department of Applied Physics and Ginzton Laboratory, Stanford University, Stanford, California 94305, USA
Christopher J. Sarabalis
Department of Applied Physics and Ginzton Laboratory, Stanford University, Stanford, California 94305, USA
Amir H. Safavi-Naeini
Department of Applied Physics and Ginzton Laboratory, Stanford University, Stanford, California 94305, USA
Amin Arbabian
Department of Electrical Engineering, Stanford University, Stanford, California 94305, USA
Abstract
A time-of-flight (ToF) imaging system is proposed and its working principle demonstrated. To realize this system, a new device, a free-space optical mixer, is designed and fabricated. A scene is illuminated (flashed) with a megahertz level amplitude modulated light source and the reflected light from the scene is collected by a receiver. The receiver consists of the free-space optical mixer, comprising a photoelastic modulator sandwiched between polarizers, placed in front of a standard CMOS image sensor. This free-space optical mixer downconverts the megahertz level amplitude modulation frequencies into the temporal bandwidth of the image sensor. A full scale extension of the demonstrated system will be able to measure phases and Doppler shifts for the beat tones and use signal processing techniques to estimate the distance and velocity of each point in the illuminated scene with high accuracy.
1 Introduction
The human visual system and standard image sensors form high-resolution images of their surroundings. These systems are effective in forming images of the surrounding scene but do not provide accurate estimates of depth. Many applications, however, rely on accurate depth images in a scene, including machine vision [1, 2], tracking [3, 4], autonomous vehicles [5, 6, 7] and robotics [8]. The need for generating accurate depth images in a scene necessitates new generation of image sensors.
Depth imaging in a scene can be achieved through the ToF imaging technique. A scene is illuminated with a controlled light source and the interaction of this light with the scene is captured and processed for estimating the depth in the scene. The most basic method for ToF imaging involves sending a focused beam of light pulse to a particular location in a scene and measuring the time delay of the returned pulse to the optical detector. Scanning the beam allows depth images to be generated. Scanning of the beam can be realized through mechanical [9, 10] scanning or non-mechanical scanning (solid state). Non-mechanical scanning usually uses optical phased arrays with full control of the phase and frequency of a laser beam [11, 12, 13, 14], although recently solid state optomechanical steering has also been proposed [15]. An alternative method, usually referred to as flash lidar, captures depth images through illuminating a part of the scene with a modulated light source. Flash lidar avoids scanning the beam by capturing a part of the scene at a single shot, making it a possible low cost, fast and effective way of measuring depth images.
One class of flash lidars operate in time domain by measuring the ToF for each sensor pixel after flashing the scene with a light source. Each point in the scene is focused to a specific image sensor pixel with the use of an optical lens. The ToF for the light to arrive at each sensor pixel is used to determine the distance of each point to the sensor. These flash lidars have high unambiguous range and depth resolution, but are limited by cost or spatial resolution since they require pulsed lasers and specialized pixels with high bandwidths [16, 17, 18, 19]. Compressed sensing techniques with a single pixel camera and a pulsed laser has also been demonstrated [20], but these systems also have limited spatial resolution compared to standard image sensors.
Another class of flash lidar sends amplitude modulated light to a scene and measures the phase of the reflected light from the scene with respect to the illumination light phase, similar to the operation of stepped frequency continuous wave (SFCW) radar [21]. This technique has also been referred to as radio frequency interferometry (RFI) [22], since light is modulated at typical radar operating frequencies and the envelope of the light is used for estimating distances. To detect distances on the order of meters with sub-meter level depth resolution, megahertz modulation frequencies are used. Standard image sensors do not have the bandwidth to capture the phase of megahertz frequencies. The standard method is to demodulate the incoming megahertz frequency to a lower frequency before sampling, similar to the working principle of a superheterodyne receiver.
State of the art phase-shift based ToF imaging sensors rely on the photonic mixer device (PMD) [23]. Megahertz modulation frequencies are measured by electronic demodulation inside every pixel. These pixels are referred to as demodulation pixels [24]. Homodyne detection is usually used to sample four different phases for the illumination. Since phase is measured, there is an ambiguity in the distance when a single frequency is used, and there is a trade-off between unambiguous range and depth resolution due to the amplitude modulation frequency selected. To significantly improve the unambiguous range while retaining the depth resolution, the phase of light at multiple amplitude modulation frequencies can be measured, and signal processing techniques similar to SFCW radar can be used.
The ToF camera using PMD technology or similar architectures use an image sensor with specialized pixels, and therefore have limited spatial resolution. Since these systems use non-standard image sensors, they are expensive. Additionally, detecting multiple frequencies simultaneously requires multi-heterodyne detection, and this requires increasingly complex "smart pixels" with large sizes, leading to large image pixels and therefore reducing spatial resolution. Standard ToF cameras measure the phase at each frequency by stepping the frequency and measuring the phase, increasing the measurement time [25].
One common problem with flash lidars is multi-path interference (MPI). Light might bounce several times in the scene and arrive at an image sensor pixel via different paths, corrupting phase estimates and therefore distance estimates. MPI is especially a big problem if there are highly reflective objects (specular or shiny) in the scene. There are solutions to overcome MPI. One approach is to use multiple frequencies to remove MPI in the scene rather than extending the unambiguous range [26, 27]. The multiple frequencies, however, could still be used to extend the unambiguous range by correcting for MPI using other methods [28]. In the rest of this paper, we neglect MPI effects and assume they are corrected or have minimal impact on measurements, and therefore we use the available frequency support for unambiguous range extension.
One possible way of measuring the phase of the incoming light modulated at megahertz frequency with a standard image sensor per pixel is by using an optical mixer (also referred to as an optical shutter) in front of the sensor to downconvert the high frequency to a lower beat tone (heterodyne detection). The system level architecture of the ToF imaging system is demonstrated in Figure 1, which shows the three main components of the ToF imaging system: modulated light source, free-space optical mixer, and the CMOS image sensor. Such an architecture would allow the use of the most advanced state of the art image sensors, which are low cost and have high spatial resolution. Such an architecture, however, ideally requires a free-space optical mixer with wide acceptance angle, low cost, low power consumption, and centimeter level aperture to be placed in front of the image sensor for performing the heterodyne detection. The function of the optical mixer is shown in Figure 2, in which the megahertz level amplitude modulated light reflected from the scene is downconverted by the optical mixer to hertz level beat tones. This allows the image sensor to detect the beat tones, which are used to estimate distance and velocity in the scene using signal processing techniques.
There have been previous attempts in designing a free-space optical mixer, however, all of these approaches have one or more drawbacks. A mechanical shutter is not practical since megahertz modulation frequencies requires extremely high rotation speeds, and this method has reliability issues due to moving parts. An image intensifier can be used for demodulation [29, 30, 31], however, the image intensifier is large in size and requires high operating voltages. Pockels cell sandwiched between polarizers can be used, but Pockels cells with centimeter level apertures are large and have prohibitively high half-wave modulation voltages [32]. Electro-absorption in multiple quantum well using an optical cavity can be used to modulate light[33], but this approach has a narrow acceptance angle for light due to the use of an optical cavity in the modulator. Stepped quantum well modulator (SQM) has also been used to modulate light, but this design has limited aperture (1 mm) and uses a microscope objective to focus the received light from the scene onto the surface of the SQM[34].
To design a free-space optical mixer with low half-wave modulation voltage, a resonant device is required. We avoid using an optical cavity since an optical cavity has a narrow acceptance angle for light, so we instead use an acoustic cavity.
In this paper, the design of an optical mixer relying on the photoelastic effect is demonstrated. The photoelastic modulator is a Y-cut lithium niobate wafer and is used to modulate the polarization state of light. Sandwiching the photoelastic modulator between two polarizers comprises the free-space optical mixer, which converts polarization modulation into intensity modulation. The optical mixer can be used with a standard CMOS image sensor to measure distances and velocity in a scene by flashing the scene with an amplitude modulated light source.
2 System Overview
In this paper, we demonstrate the working principle of a prototype phase-shift based ToF imaging system with a standard CMOS image sensor using a resonant photoelastic modulator. A part of a scene is illuminated with amplitude modulated light and the reflected light from the scene is downconverted by an optical mixer and then imaged on a CMOS image sensor. The optical mixer consists of a photoelastic modulator sandwiched between polarizers. The photoelastic modulator is a 0.5 mm thick and 5.08 cm diameter Y-cut lithium niobate wafer with longitudinal and transparent electrodes. The photoelastic modulator modulates the polarization of light by operating the lithium niobate wafer at its mechanical resonance modes. To demonstrate proof of concept, light of wavelength 630 nm is amplitude modulated at two frequencies and downconverted by the optical mixer such that the two beat tones fall within the bandwidth of the image sensor. We demonstrate the detection of two beat tones using heterodyne detection with a CMOS image sensor. This opens the way for simultaneous multi-frequency operation which can play a critical role as a flash lidar for various applications.
3 Polarization Modulation by Photoelastic Effect
In this section, the applied voltage to the photoelastic modulator will be related to the change in the polarization state of light passing through the modulator. The polarization modulation will be determined by calculating the modulated index ellipsoid for the photoelastic modulator.
The index ellipsoid determines how light propagates in a material. The index ellipsoid can be modulated by using the photoelastic effect. Using the piezoelectric effect, strain can be generated in a wafer to control the polarization state of light electronically by modulating the index ellipsoid. The polarization modulation should be such that the two in-plane refractive indices for the wafer are modulated by different amounts to result in an in-plane polarization rotation for light.
Photoelastic modulators are used commercially to control the polarization state of light, but they generally use a non-piezoelectric and isotropic material with transverse (parallel or nearly parallel to the incoming light direction) piezoelectric transducers to generate strain in the sample [35, 36]. This configuration automatically breaks in-plane symmetry and leads to in-plane polarization modulation. The fundamental mechanical resonance frequencies for these devices are usually in the kilohertz range due to the centimeter scale optical aperture. Higher order mechanical modes can be used to drive the modulator, but as the mode order increases, the volume average for strain in the sample decreases due to the varying sign of the strain in the sample. Therefore, using transverse electrodes for the photoelastic modulator limits the mechanical resonance frequencies to kilohertz range, greatly limiting the depth resolution of an imaging system. To achieve megahertz mechanical resonance frequencies and square-centimeter-level apertures with high modulation efficiency, the electrodes need to be placed normal to the incoming light direction. If a standard wafer of thickness 0.5 mm is used, the fundamental mechanical resonance frequency will appear at roughly 4 MHz for lithium niobate, with resonance frequencies reaching up to 100 MHz (although as the mode order increases, the modulation efficiency drops).
If an isotropic material is used for polarization modulation, applying strain in the longitudinal direction (normal to the wafer) does not result in a change in the in-plane refractive indices due to in-plane symmetry with respect to the excitation. We therefore use a Y-cut lithium niobate wafer as the photoelastic modulator, breaking in-plane symmetry and leading to a net polarization modulation when longitudinal electrodes are used to generate strain in the wafer.
Lithium niobate and many other piezoelectric materials are birefringent. Using a birefringent wafer leads to a static polarization rotation, which is different for rays incident on the wafer at different angles. Not correcting for this static birefringence will lead to a limited acceptance angle for the wafer. To correct for this static birefringence, which is standard practice in the design of wave plates, another identical wafer is placed parallel to the original wafer but rotated in plane by 90∘. Figure 3 demonstrates the polarization modulation by the photoelastic modulator.
If the strain profile is uniform or nearly uniform across the cross section of the wafer, to first order a single index ellipsoid can be used to describe the polarization modulation of light as it passes through the wafer. This approximation will be used throughout this section. The unmodulated index ellipsoid for the lithium niobate wafer can be written as in (1), where and are the ordinary and extraordinary refractive indices of lithium niobate, respectively.
[TABLE]
To determine the effective index ellipsoid after strain is generated in the wafer through the piezoelectric effect, the wafer will be separated into infinitesimal volumes which have an infinitesimal thickness along the y direction of the crystal and other dimensions equal to the wafer cross-section. Using the strain components, the polarization modulation can be determined for each of these infinitesimal volumes using the photoelastic effect. Let S denote the strain tensor in the wafer. The strain tensor is expressed as follows: . The modulated index ellipsoid for this infinitesimal volume is expressed in (2), where are the photoelastic constants of lithium niobate for .
[TABLE]
To first order, the effective index ellipsoid for the wafer is the arithmetic average of the index ellipsoids for these infinitesimal volumes. The effective index ellipsoid can be expressed as in (3), where is the volume average for strain component in the wafer for .
[TABLE]
We use the volume average of strain for the rest of the calculations. To determine the volume average strain tensor components generated in the lithium niobate wafer when voltage is applied through longitudinal electrodes, we simulate the wafer using the mechanics and piezoelectric modules in COMSOL [37] simulation platform in frequency domain.
The electrodes only cover half of the surface area for the wafer to limit clamping losses when the wafer is tested experimentally, as shown in Figure 3. For megahertz mechanical frequencies at room temperature, clamping losses are usually the dominant loss mechanism. The wafer will be clamped from the sides, therefore only the center part is deposited with aluminum wire grids and the light is passed through this section for polarization modulation.
The strain tensor components are calculated in the frequency domain from (0.1-25) MHz with a frequency stepping of 10 kHz. Since the net polarization rotation of light is important, we calculate the volume average for the strain components. It is seen from COMSOL simulations that and with respect to crystal axis are the strain components which have a significant non-zero volume average for strain. The effective index ellipsoid can therefore be expressed as (4).
[TABLE]
We apply a rotation to the yz axis such that the new form is diagonal [38]. Using the coordinate transformations in (5), (4) can be transformed into (6).
[TABLE]
[TABLE]
Since , we neglect the modulations of the and axis which include the term. We assume for our analysis that the beam is incident at an angle to the normal. Since < 1∘ usually, the path traversed by the beam is approximately equal to the thickness of the wafer.
[TABLE]
Figure 4 shows the volume average of the strain components in the wafer and corresponding to the region covered with longitudinal electrodes. We see resonances at multiple frequencies, but for the rest of this paper we will be focusing on the resonance frequencies at the fundamental mechanical resonance frequency for the wafer at roughly 3.7 MHz and the resonance frequency at roughly 20.5 MHz. We first consider the fundamental mode at 3.7 MHz. The cross section of the wafer at the center for the and strain components around the fundamental mechanical resonance frequency and the volume average for the strain components inside the wafer are shown in Figure 5. When the wafer is driven at one of its mechanical resonance frequencies , the volume average strain components can be expressed as and . The modified index ellipsoid in this case can be expressed as in (8).
[TABLE]
The electro-optic effect has negligible effect compared to the photoelastic effect due to the high mechanical resonance exhibited by the wafer, therefore the electro-optic effect will not be included in the polarization modulation calculations. In the next section, polarization modulation for an incoming beam along the direction of the crystal will be calculated. It can be shown that the acceptance angle for this type of photoelastic modulator is roughly 20∘ due to birefringence of the wafer when 0.5 mm thick wafer is used along with another identical wafer placed parallel and rotated in plane by 90∘. A thinner wafer can be used to increase the acceptance angle (e.g. 0.1 mm). More detailed analysis for arbitrary angles, field of view, and taking the electro-optic effect into account will be explained in a future work.
3.1 Normal Incidence
In this section, the polarization modulation as a function of time will be derived assuming the incoming beam is perpendicular to the wafer (actually at an angle to the normal of the wafer) and the wafer is driven at its fundamental mechanical resonance frequency of . Another identical wafer parallel and rotated in plane by 90∘ is placed after the photoelastic modulator to correct for static polarization rotation of light. The incoming beam sees the refractive indices and when passing through the photoelastic modulator, where refractive index along the x and z directions are modulated by the photoelastic effect as in (8).
[TABLE]
Since , we can approximate (9) as shown in (10).
[TABLE]
[TABLE]
[TABLE]
The change in the in-plane refractive indices is expressed in (13).
[TABLE]
The polarization change of light after passing through the wafer of thickness with wavelength of light is expressed in (14).
[TABLE]
3.1.1 Depth of Polarization Modulation
In this section, the relationship between the depth of polarization modulation as a function of the applied peak-to-peak voltage to the photoelastic modulator and the quality factor of the fundamental mechanical resonance mode of the wafer will be derived. We calculate the depth of polarization modulation assuming normal incidence of light to the lithium niobate wafer at the fundamental mechanical resonance frequency for the wafer. We calculate the volume average for the two strain components ( and ) contributing to polarization modulation in the sample using COMSOL. Loss is added to the lithium niobate wafer to determine the strain components and therefore the depth of modulation at a given mechanical quality factor and voltage applied to the electrodes.
In simulation, we apply 2 V peak-to-peak to the electrodes at around the fundamental mechanical resonance frequency for the wafer (approximately 3.7 MHz). From COMSOL simulations in Figure 5, we see that the volume average at the resonance is roughly and . Using the photoelastic constants , , and from [39] with (13) and (14), the depth of polarization modulation is calculated to be 0.0715 radians for light of wavelength 630 nm. The quality factor for the wafer in the simulation with the added loss is roughly 9000 (calculated based on 3dB cut-off points for the strain around the fundamental mechanical resonance frequency). Based on these results, the depth of polarization modulation for an incident beam along the y direction of the wafer can be calculated roughly as in (15) for light of wavelength 630 nm:
[TABLE]
The depth of modulation is independent of the wafer thickness to first order, since the electric field inside the wafer is inversely proportional to wafer thickness, however, this is compensated by the larger path traversed by the light when passing through the wafer. The acceptance angle for a 0.5 mm thick wafer is roughly 20∘ when the static polarization correcting wafer is also used. The acceptance angle is calculated by finding the largest incoming angle with respect to the wafer normal such that the static birefringence between the ordinary and extraordinary rays is 90∘. A thinner wafer can be used to improve the acceptance angle for the photoelastic modulator while retaining the same depth of polarization modulation.
4 Polarization Modulation Conversion to Intensity Modulation
Polarization modulation can be converted into intensity modulation by sandwiching the photoelastic modulator between two polarizers. Malus’ law governs the transmitted intensity of light after passing through a polarizer: the transmitted intensity of light after passing through a polarizer is scaled by cosine squared of the angle between the polarization direction of light and the transmission axis of the polarizer. Since standard polarizers have high extinction ratios, high modulation depth can be realized.
When the lithium niobate wafer is driven near its resonance mode(s), the intensity modulation is a cosine inside a cosine (similar to frequency modulation). This expression can be expanded by the Jacobi-Anger expansion, causing an infinite number of equally spaced frequencies. For each amplitude modulation frequency, the fundamental tone is downconverted into the bandwidth of the image sensor, and the fundamental tone is used for signal processing; the other tones are low-pass filtered by the image sensor.
The scene is illuminated with intensity modulated light at frequencies slightly detuned from the frequencies used to drive the photoelastic modulator . The light reflected from location with reflectivity in the scene is represented as , which is Doppler shifted by and phase shifted by . The received heterodyne beat signal at image sensor pixel corresponding to scene location is represented as , which carries the phase and Doppler information at a frequency ( Hz) which falls within the bandwidth of the image sensor. represents the multiple beat frequencies detected by a single image sensor pixel, where is the angle between the polarization direction of light that has passed through the photoelastic modulator and the second polarizer transmission axis. is the phase shift at the receiver of the amplitude modulated light that illuminates the scene, where is the distance of the receiver to the scene location . is the Doppler shift for the received light due to motion with velocity in the scene location . The distance and velocity of each point in the scene can be efficiently computed by performing a fast Fourier transform (FFT) with respect to time per image sensor pixel and using the phase and frequency shift information.
[TABLE]
[TABLE]
[TABLE]
[TABLE]
Due to optical mixing, which takes place in (18), many tones are generated. The beat term which falls within the bandwidth of the image sensor is shown in (20), with .
[TABLE]
Optimum depth of modulation can be calculated by optimizing , assuming the depth of polarization modulation is the same for all frequencies . The sin term is maximized when , suggesting that the angle between the two polarizer transmission axis should have a 45∘ angle difference. Figure 6 shows the depth of modulation for different values and number of frequencies used to drive the photoelastic modulator. The intensity modulation depth drops as the number of frequencies used to drive the photoelastic modulator is increased. Since the wafer is multi-moded and multiple of these modes are driven with the source, many mixing terms appear in the spectrum, reducing the depth of modulation. Additionally, the polarization modulation depth for higher order mechanical modes will be smaller compared to lower order modes. An alternative to using a single wafer driven at multiple of its resonance frequencies is to have wafers of different thicknesses that are placed in front and parallel to each other. Each wafer can then be driven at its fundamental mechanical resonance frequency, or possibly by driving another higher order mode.
5 Multi-Frequency Operation to Extend Unambiguous Range
Using a single frequency for distance measurements limits the unambiguous range or the depth resolution. When using a single frequency, which refers to the amplitude modulation frequency, the unambiguous range is limited to half the wavelength \Big{(}\frac{c}{2f_{i}}\Big{)} corresponding to the frequency used. Using a low frequency results in a large unambiguous range, but the estimated phase needs to be accurate, since the calculated distance is directly proportional to the measured phase. Even small phase errors due to shot-noise or electronic noise will lead to significant distance errors, which necessitates using megahertz frequencies. If a single frequency is used, and the range is limited to , the measured phase for the beat tone is used as in (21) to estimate the distance corresponding to image sensor pixel :
[TABLE]
To significantly improve the unambiguous range while retaining the depth resolution, the phase of multiple frequencies can be used after the round-trip of light, similar to the operation of SFCW radar. The standard image sensor has high angular resolution and most of the light from the scene is reflected once, therefore the limitations that apply to SFCW radar do not apply. The high angular resolution provided by the image sensor limit the number of reflectors in the scene per sensor pixel to one. This allows achieving high depth resolution and unambiguous range despite measuring the phase of the returned light at several discrete frequencies.
There are two problems that need to be addressed: the number of modulation frequencies to be used, and the reconstruction algorithm for estimating the distance and velocity per image sensor pixel. In this paper, we focus on the reconstruction algorithm for distance, and leave the selection of the modulation frequencies and velocity estimation as future work.
We first solve the problem of finding an algorithm for distance reconstruction per sensor pixel corresponding to location in the scene, assuming modulation frequencies are used for illumination, and the phase response measured at each frequency using the optical mixer and an image sensor. Maximum likelihood detection is used for distance reconstruction per image sensor pixel to maximize the probability of correct detection.
Before using the forward reconstruction algorithm for estimating the distance, we need to accurately predict the phase of each frequency sampled by the image sensor. This is equivalent to estimating the complex gains of a noisy mixture of sinusoids, where the noise is white and follows a Gaussian distribution. The phases for the mixture of noisy sinusoids can be estimated efficiently via the Newtonized orthogonal matching pursuit (NOMP) [40]. Once the phases have been extracted, each phase can be modeled as a Gaussian distribution: \psi_{i}^{*}\sim\mathcal{N}\Big{(}\frac{4\pi d(x_{k},y_{l})f_{i}}{c}\text{ (mod }\textbf{ }2\pi),\sigma^{2}\Big{)}, with the distance of the location in the scene to the receiver, the speed of light in the scene, and the noise variance. Due to the phase wrapping, even if multiple frequencies are used and perfect phase information is retrieved, there will always be an ambiguous range at the least common multiple of the wavelengths corresponding to the modulation frequencies. This presents an ill-posed optimization problem due to multiple solutions. As a way around this problem, we define an unambiguous range, which is smaller than the least common multiple of the modulation frequencies. In fact, this unambiguous range should be determined based on the signal-to-noise ratio (SNR) and the modulation frequencies, but that problem will not be dealt in this paper.
We cast the distance estimation as an optimization problem, in which the most likely distance to explain the observed phases within the selected unambiguous range is chosen as the distance estimate per image sensor pixel. If is the estimated phase corresponding to amplitude modulation frequency , the selected unambiguous range, and the probability density function of a Gaussian random variable, the optimization problem can be expressed as in (22).
[TABLE]
This is a non-convex optimization problem due to phase wrapping. One possible approach to solve the optimization problem is by separating the optimization problem into bounded least-squares problems through constraining the distance such that within each of the regions, the objective function is convex (possibly also with some approximations). The global maximum among the local maxima would then be equivalent to solving the non-convex optimization problem. We leave this approach as future work, and use a reconstruction algorithm based on forward reconstruction.
Taking the logarithm of (22), this problem is equivalent to (23), where is a vector of integers to deal with phase wrapping.
[TABLE]
We use forward reconstruction to estimate the distance corresponding to image sensor pixel . Within the unambiguous range , we discretize the region with resolution . We evaluate the phase that would have been observed if there was no noise corrupting the measurements for each frequency with , where , . The distance is estimated by minimizing the objective function in (23), and this procedure is applied for each image sensor pixel to estimate the distance in the scene .
We simulate the performance of the distance estimation algorithm per image sensor pixel assuming an unambiguous range of 100 m, camera frame rate of 600 Hz, shot-noise limited measurements with 3 modulation frequencies used at (97.8, 19.59, 4.02) MHz and beat tones appearing at (80, 170, 250) Hz, respectively. The performance of the algorithm for these parameters and as a function of number of frames and the number of photons per frame per pixel is shown in Figure 7. The average estimation error in the range (1-100) m using 2000 photons per pixel per frame and 200 frames used per distance estimate is around 0.8 cm. Velocity estimation in a scene is not considered in this paper, but essentially the Doppler shift of the tones are used. The details for the estimation algorithm, choosing the frequencies to maximize depth resolution, unambiguous range, and extracting velocity from the scene will be explained in a future work.
6 Experiment
A Y-cut lithium niobate wafer of 0.5 mm thickness and 5.08 cm diameter is coated with aluminum wire grid on both surfaces with alignment to attain near uniform electric field inside the wafer (pointing along the y direction) and to retain optical transparency. Photolithography with lift-off process is used to deposit 100 nm thick aluminum grid wire on an area of 2.04 cm diameter and centered on both front and back sides of the wafer through back side alignment. Each aluminum wire is 4 m thick and separated by 40 m. Wirebonding is used from the top and bottom electrode connections stretching from the center part coated with aluminum wire grid to the side of the wafer to connect to a PCB plane. The wafer is supported on the PCB through the use of three nylon washers which are equally separated and clamp the wafer from the sides. The washers hold the wafer through epoxy. The prototype ToF imaging system is shown in Figure 8.
6.1 Mechanical Response
The mechanical response of the device is measured using a vector network analyzer (VNA). Figure 9 shows the mechanical frequency response for the device ( parameter measured with respect to 50 ). The fundamental mechanical resonance frequency shows up around 4.02 MHz and the other resonance modes are spaced by around 8 MHz, double the fundamental resonance frequency. The wafer supports modes up to 100 MHz, but the focus for the rest of this section will be on the fundamental mechanical resonance frequency at 4.02 MHz and the higher order mode at 19.58 MHz. We know from COMSOL simulations in Figure 4 that these modes should have a net volume average for strain inside the wafer (corresponding to the COMSOL modes at 3.7 MHz and 20.5 MHz, respectively).
6.2 Optical Mixing
To observe optical mixing on the CMOS image sensor and downconvert megahertz level amplitude modulation frequencies down to hertz range, we amplitude modulate a light-emitting diode (LED) emitting light of wavelength 630 nm at a frequency slightly offset from the mechanical resonance frequency of the wafer. The light passes through the optical mixer, which includes the aluminum deposited lithium niobate wafer. The system includes the amplitude modulated LED, polarizer, aluminum deposited lithium niobate wafer (photoelastic modulator) driven at one or more resonance frequencies, a 90∘ rotated lithium niobate wafer, and another polarizer. We observe optical mixing at 4.02 MHz when the wafer is driven at resonance and the LED is detuned in frequency by 100 Hz. We also observe mixing when the higher order mode is driven at around 19.58 MHz and the LED is detuned by 60 Hz. Multi-heterodyne detection is observed, in which two tones are driven simultaneously (4.02 MHz and 19.58 MHz) and the beat tones placed at 60 Hz and 100 Hz, respectively. The mixing terms are shown in Figure 10. The figure also shows that the photoelastic effect is what causes the optical mixing, because when the frequency supplied to the photoelastic modulator is swept around the fundamental mechanical resonance frequency, the beat tone signal level (appearing at 100 Hz) changes and shows a resonance behavior.
6.3 Analyzing Experimental Results
The depth of intensity modulation is 0.1% when 20 V peak-to-peak is applied to the wafer. The fundamental mechanical resonance mode at 4.02 MHz has a quality factor of roughly 11,000. The depth of modulation for the LED was 12%. Using (15) and (20), the expected depth of modulation can be calculated as 1.75%.
The discrepancy between the expected and measured depth of modulation could be due to the misalignment between the wafers (leading to constructive and destructive interferences as a result of static polarization). Another possible source could be the operation of the photoelastic modulator as an open-loop system. Since the fundamental mechanical mode has a high quality factor, to achieve high modulation depth the device needs to be operated at resonance, and even small frequency drifts in the fundamental mode should be tracked with a closed-loop system (e.g. phase-locked loop).
The observed optical mixing shows that the photoelastic modulator is a promising optical mixer. Depth of modulation can be improved through closed-loop driving to track any resonance drifts, aligning the optical components and re-fabricating the device to attain higher mechanical Q. Future work will focus on using the imaging system to form depth images in a scene.
7 Conclusion
The working principle of a prototype phase-shift based ToF imaging system using an optical mixer, consisting of a photoelastic modulator sandwiched between polarizers, and placed in front of a standard CMOS image sensor is demonstrated. The photoelastic modulator is a Y-cut lithium niobate wafer, which has a thickness of 0.5 mm and a diameter of 5.08 cm. The photoelastic modulator is significantly more efficient than an electro-optic modulator for polarization modulation owing to the high mechanical Q and the strong piezoelectricity and photoelasticity of lithium niobate. The working principle of the system, including polarization modulation through the resonant photoelastic effect, converting polarization modulation to intensity modulation, and multi-frequency operation by simultaneously driving the photoelastic modulator at multiple of its mechanical resonance frequencies are demonstrated. We have demonstrated that with the addition of a cost-effective, compact optical mixer, a standard image sensor can function as a high resolution flash lidar system.
Acknowledgments
This work is supported in part by Stanford SystemX Alliance, Office of Naval Research, and NSF ECCS-1808100. Device fabrication was performed at the Stanford Nano Shared Facilities (SNSF) and the Stanford Nanofabrication Facility (SNF), supported by the National Science Foundation under award ECCS-1542152.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] J. Han, L. Shao, D. Xu, and J. Shotton, “Enhanced Computer Vision with Microsoft Kinect Sensor: A Review,” IEEE Transactions on Cybernetics , vol. 43, no. 5, pp. 1318–1334, 2013.
- 2[2] A. Kadambi and R. Raskar, “Rethinking Machine Vision Time of Flight with G Hz Heterodyning,” IEEE Access , vol. 5, pp. 26211–26223, 2017.
- 3[3] M. Hammer, M. Hebel, and M. Arens, “Automated object detection and tracking with a flash Li DAR system,” in Electro-Optical Remote Sensing X , vol. 9988, p. 998803, International Society for Optics and Photonics, 2016.
- 4[4] A. Dewan, T. Caselitz, G. D. Tipaldi, and W. Burgard, “Motion-based Detection and Tracking in 3D Li DAR Scans,” in Robotics and Automation (ICRA), 2016 IEEE International Conference on , pp. 4508–4513, IEEE, 2016.
- 5[5] J. Choi, S. Ulbrich, B. Lichte, and M. Maurer, “Multi-Target Tracking using a 3D-Lidar Sensor for Autonomous Vehicles,” in Intelligent Transportation Systems-(ITSC), 2013 16th International IEEE Conference on , pp. 881–886, IEEE, 2013.
- 6[6] B. Schwarz, “LIDAR: Mapping the world in 3D,” Nature Photonics , vol. 4, no. 7, p. 429, 2010.
- 7[7] R. Domínguez, E. Onieva, J. Alonso, J. Villagra, and C. González, “LIDAR based Perception Solution for Autonomous Vehicles,” in Intelligent Systems Design and Applications (ISDA), 2011 11th International Conference on , pp. 790–795, IEEE, 2011.
- 8[8] M. Bansal, B. Matei, B. Southall, J. Eledath, and H. Sawhney, “A LIDAR Streaming Architecture for Mobile Robotics with Application to 3D Structure Characterization,” in Robotics and Automation (ICRA), 2011 IEEE International Conference on , pp. 1803–1810, IEEE, 2011.
