Lidar-Monocular Visual Odometry with Genetic Algorithm for Parameter Optimization
Adarsh Sehgal, Ashutosh Singandhupe, Hung Manh La, Alireza Tavakkoli,, Sushil J. Louis

TL;DR
This paper enhances Lidar-Monocular Visual Odometry (LIMO) by applying a genetic algorithm to optimize its parameters, significantly improving localization accuracy on the KITTI dataset.
Contribution
It introduces a genetic algorithm-based parameter optimization method for LIMO, addressing manual tuning challenges and improving odometry performance.
Findings
Genetic algorithm reduces translation error in LIMO.
Optimized parameters improve localization accuracy.
Method validated on KITTI dataset.
Abstract
Lidar-Monocular Visual Odometry (LIMO), a odometry estimation algorithm, combines camera and LIght Detection And Ranging sensor (LIDAR) for visual localization by tracking camera features as well as features from LIDAR measurements, and it estimates the motion using Bundle Adjustment based on robust key frames. For rejecting the outliers, LIMO uses semantic labelling and weights of the vegetation landmarks. A drawback of LIMO as well as many other odometry estimation algorithms is that it has many parameters that need to be manually adjusted according to the dynamic changes in the environment in order to decrease the translational errors. In this paper, we present and argue the use of Genetic Algorithm to optimize parameters with reference to LIMO and maximize LIMO's localization and motion estimation performance. We evaluate our approach on the well known KITTI odometry dataset and…
| Parameters | LIMO | GA-LIMO |
|---|---|---|
| 0.95 | 0.986 | |
| 400 | 999 | |
| 400 | 960 | |
| 400 | 859 | |
| 0.9 | 0.128 |
| Parameters | LIMO | GA-LIMO |
|---|---|---|
| 0.95 | 0.958 | |
| 400 | 999 | |
| 400 | 593 | |
| 400 | 877 | |
| 0.9 | 0.813 |
| Parameters | LIMO | GA-LIMO |
|---|---|---|
| 0.95 | 0.963 | |
| 400 | 999 | |
| 400 | 554 | |
| 400 | 992 | |
| 0.9 | 0.971 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Lidar-Monocular Visual Odometry with Genetic Algorithm for Parameter Optimization
Adarsh Sehgal, Ashutosh Singandhupe, Hung Manh La, Alireza Tavakkoli, Sushil J. Louis Adarsh Sehgal, Ashutosh Singandhupe and Dr. Hung La are with the Advanced Robotics and Automation (ARA) Laboratory. Dr. Alireza Tavakkoli is with the Computer Vision Laboratory (CVL). Dr. Sushil J. Louis is with the Evolutionary Computing Systems Lab, University of Nevada, Reno, NV 89557, USA. Corresponding author: Hung La, email: [email protected] material is based upon work supported by the National Aeronautics and Space Administration (NASA) Grant No. NNX15AI02H issued through the NVSGC-RI program under sub-award No. 19-21, and the RID program under sub-award No. 19-29, and the NVSGC-CD program under sub-award No. 18-54. This work is also partially supported by the Office of Naval Research under Grant N00014-17-1-2558.
Adarsh Sehgal, Ashutosh Singandhupe, Hung Manh La, Alireza Tavakkoli, Sushil J. Louis Adarsh Sehgal, Ashutosh Singandhupe and Dr. Hung La are with the Advanced Robotics and Automation (ARA) Laboratory. Dr. Alireza Tavakkoli is with the Computer Vision Laboratory (CVL). Dr. Sushil J. Louis is with the Evolutionary Computing Systems Lab, University of Nevada, Reno, NV 89557, USA. Corresponding author: Hung La, email: [email protected] material is based upon work supported by the National Aeronautics and Space Administration (NASA) Grant No. NNX15AI02H issued through the NVSGC-RI program under sub-award No. 19-21, and the RID program under sub-award No. 19-29, and the NVSGC-CD program under sub-award No. 18-54. This work is also partially supported by the Office of Naval Research under Grant N00014-17-1-2558.
Abstract
Lidar-Monocular Visual Odometry (LIMO), a odometry estimation algorithm, combines camera and LIght Detection And Ranging sensor (LIDAR) for visual localization by tracking camera features as well as features from LIDAR measurements, and it estimates the motion using Bundle Adjustment based on robust key frames. For rejecting the outliers, LIMO uses semantic labelling and weights of the vegetation landmarks. A drawback of LIMO as well as many other odometry estimation algorithms is that it has many parameters that need to be manually adjusted according to the dynamic changes in the environment in order to decrease the translational errors. In this paper, we present and argue the use of Genetic Algorithm to optimize parameters with reference to LIMO and maximize LIMO’s localization and motion estimation performance. We evaluate our approach on the well known KITTI odometry dataset and show that the genetic algorithm helps LIMO to reduce translation error in different datasets.
I INTRODUCTION and RELATED WORK
Motion estimation has long been a popular subject of research in which many techniques have been developed over the years [1]. Much work has been done related to Visual Simultaneous Localization and Mapping (VSLAM), also referred to as Visual Odometry [2], which simultaneously estimates the motion of the camera and the 3D structure of the observed environment. A recent review of SLAM techniques for autonomous car driving can be found in [3]. Bundle Adjustment is the most popular method for VSLAM. Bundle Adjustment is a procedure of minimizing the re-projection error between the observed point (landmarks in reference to LIMO) and the predicted points. Recent developments make use of offline VSLAM for mapping and localization [4, 5, 6].
Figure 1 illustrates the structure of the VSLAM pipeline [7]. Algorithms such as ROCC [8] and SOFT [9] rely on pre-processing and feature extraction which is in very contrast to most of the methods that obtain scale information from a camera placed at a different viewpoint[6, 10, 11, 12]. The former mentioned algorithms (SOFT and ROCC) extract robust and precise features and select them using special techniques, and has managed to attain high performance on the KITTI Benchmark [13], even without Bundle Adjustment.
The major disadvantage of stereo camera is it’s reliance on extrinsic camera calibration. It was later observed that the performance can be enhanced by learning a compensation of the calibration bias through a deformation field [14]. LIght Detection And Ranging sensor LIDAR-camera calibration is also an expanding topic of research [15, 16] with accuracy reaching a few pixels. Previous work has been done with VSLAM and LIDAR [17, 18, 19, 20]. Lidar-Monocular Visual Odometry (LIMO) [7], uses feature tracking capability of the camera and combines it with depth measurements from a LIDAR but suffers from translation and rotation errors. Later on, we describe our approach for increasing LIMO’s robustness to translation errors.
LIMO takes advantage of the depth information from LIDAR which is to be used for feature detection in the image. Outliers are rejected if do not meet the local plane assumptions, and points on the ground plane are treated for robustness. As illustrated in figure 1, in the VSLAM pipieline, the depth information is fused with monocular feature detection techniques. Another approach is taken for prior estimation, landmark selection and key frame selection to fulfill real time constraints. Unlike the approach in [18], LIMO does not use any LIDAR-SLAM algorithms such as Iterative Closest Point (ICP). The major drawback of LIMO is that it has many parameters, which needs to be manually tuned. LIMO suffers from translation and rotation errors even more than existing algorithms such as Lidar Odometry and Mapping (LOAM) [20] and Vision-Lidar Odometry and Mapping (V-LOAM) [23]. Typically, researchers tune parameters (in LIMO as well) in order to minimize these errors but there always exists the possibility of finding better parameter sets that may be optimized for specific camera and LIDAR hardware or for specific scenarios. Hence, there is a need to use optimization algorithms to increase LIMO’s performance. In this paper, we propose using a genetic algorithm (GA) to efficiently search the space of possible LIMO parameter values to find precise parameter that maximizes performance. Our experiments with this new GA-LIMO algorithm show that GA-LIMO performs little better than stock LIMO.
Much empirical evidence shows that evolutionary computing techniques such as Genetic Algorithms (GAs) work well as function optimizers in poorly-understood, non-linear, discontinuous spaces [24, 25, 26, 27, 28]. GAs [29, 30] and the GA operators of crossover and mutation [31] have been tested on numerous problems. Closer to our research, GAs have been applied to early SLAM optimization problems [32], mobile localization using ultrasonic sensors [33] [34], and in deep reinforcement learning [35]. This provides good evidence for GA efficacy on localization problems, and our main contribution in this paper is a demonstration of smaller translation error when using a GA to tune LIMO parameter values compared to the stock LIMO algorithm [7]. Our experiments show that translation error is non-linearly related to LIMO parameters, that is, translation error can vary non-linearly based on the values of the LIMO’s parameters. The following sections describe the LIMO, the GA and GA-LIMO algorithms. We then show results from running LIMO with GA tuned parameters on the KITTI odometry data sequences [36].
This paper is organized as follows: Section 2 describes the LIMO and GA algorithms. Section 3 describes the combined GA-LIMO algorithm. Section 4 then specifies our experiments and provides results. The last section delivers conclusions and possible future work.
II BACKGROUND
In this section, we present prior work related to our GA-LIMO algorithm. We first describe the VSLAM pipeline and then the LIMO algorithm.
II-A Feature extraction and pre-processing
Figure 1 shows feature extraction’s procedure in the pipeline. Feature extraction consists of tracking the features and associating the features using the Viso2 library. [11] It is further used to implement feature tracking which comprises of non-maximum suppression, sub-pixel refinement and outlier rejection by flow. Deep learning is used to reject landmarks that are moving objects. The neighborhood of the feature point in a semantic image [37] is scanned, and if the majority of neighboring pixels categorize to a dynamic class, like vehicle or pedestrian, the landmark is excluded.
II-B Scale Estimation
For scale estimation, the detected feature points from camera is mapped to the depth extracted from LIDAR. LIMO uses a one shot depth estimation approach. Initially LIDAR point cloud is transformed into the camera frame and then it is projected onto the image plane. In detail, the following steps are executed for every feature point :
A region of interest is selected around which is a set consisting of projected LIDAR points. 2. 2.
A new set called foreground set is created by segmenting the elements of . 3. 3.
The elements of are fitted with a plane . A special fitting algorithm is used in case belongs to the ground plane. 4. 4.
To estimate the depth, is intersected with the line of sight corresponding to . 5. 5.
For the previous estimated depth a test is performed. Depth estimates that are more than 30m are rejected since they could be uncertain. In addition, the angle between the line of sight of the feature point and the normal of the plane must be smaller than a threshold.
From the point clouds, neighborhoods for ordered point clouds can be selected directly. However, projections of the LIDAR points on the image are used, and the points within a rectangle in the image plane around are selected in case the point clouds are unordered. Before the plane estimation is performed, the foreground is segmented. In the next step, a histogram of depth having a fixed bin width of m is created and interpolated with elements in . LIDAR points of the nearest bin is used to perform segmentation using all detected feature points. For estimating the local surface around precisely, fitting the plane to can help. Three points are chosen from the points in , which traverse the triangle with maximum area. Depth estimation is avoided if the area of is too small, to evade incorrectly estimated depth.
However, the above technique cannot be used to estimate the the depth of points on the ground plane because LIDAR has a lower resolution in the perpendicular direction than in a level direction. A different approach is followed to enable depth estimation for a relevant ground plane. For solving this, RANSAC with refinement is used on the LIDAR point cloud to extract the ground plane[5]. In order to estimate feature points on the road, points that corresponds to the ground plane are segmented. Outliers are extracted by allowing only local planes that lie close to the ground plane.
II-C Frame to Frame Odometry
Perspective-n-Point-Problem [5] serves as the starting point of the frame to frame motion estimation.
[TABLE]
[TABLE]
where is the observed feature point in the current frame, is the 3D-point corresponding to , the transform from the previous to the current frame is denoted by freedom , which has three translation and three rotation degrees of freedom. While is the projection function from the 3D to 2D domain. The extracted features with valid estimated depth depth from the environments that has low structure and large optical flow may be very small. LIMO introduces epipolar error as [4].
[TABLE]
where fundamental matrix can be calculated from the intrinsic calibration of the camera and from the frame to frame motion of the camera. LIMO suggests the loss function to be Cauchy function [4]: , where is the fix outlier threshold. For frame to frame motion estimation, the optimization problem can be denoted as:
[TABLE]
II-D Backend
LIMO proposes a Bundle Adjustment framework based on keyframes , with key components as selection of keyframes, landmark selection, cost functions and robustification measures. The advantage with this approach is that it retains the set that carries information which are required for accurate pose estimation as well as excludes the unnecessary measurements. Keyframes are classified as required, rejected and sparsified keyframes. Required frames are crucial measurements. Frame rejection is done when the vehicle does not move. The remaining frames are collected, and the technique selects frames every 0.3s. Finally in keyframe selection, length of optimization window is chosen.
An optimal set of landmarks should be well observable, small, free of outliers and evenly distributed. Landmark selection divides all landmarks into three bins, near, middle and far, each of which have fixed number of landmarks selected for the Bundle Adjustment. Weights of the landmarks are then determined based on the semantic information. The estimated landmark depth is taken into consideration by an additional cost function,
[TABLE]
where denotes the landmark, mapping from world frame to camera frame and denotes the depth estimate. The indices denote combination of landmark-pose. A cost function punishes deviations from the length of translation vector,
[TABLE]
where , are the last two poses in the optimization window and , where is a constant with value of before optimization.
While outliers need to be removed because they do not let Least-Square methods to converge [38, 39], semantics and cheirality only does preliminary outlier rejection. The LIMO optimization problem can now be formulated as:
[TABLE]
where is the re-projection error, and weights , and are used to scale the cost functions to the same order of magnitude.
II-E Genetic Algorithm (GA)
GA [24, 29, 40, 25] were designed to search poorly-understood spaces, where exhaustive search may not be feasible (because of search space and time space complexity), and where other search techniques perform poorly. A GA as a function optimizer tries to maximize a fitness tied to the optimization objective. In general, evolutionary computing algorithms, and specifically GAs have had verifiable success on a diversity of difficulty, non-linear, design and optimization problems. GAs usually start with a randomly initialization population of candidate solution encoded in a string. Selection operators then focus search on higher fitness areas of search space whereas crossover and mutation operators generate new candidate solutions. We next explain the specific GA used in this paper.
III GA-LIMO algorithm
In this section, we present the main contribution of our paper. The specific GA searches through the space of parameter values used in LIMO for the values that maximizes the performance and minimizes the translation error as a result of pose estimation. We are targeting the following parameters: outlier rejection quantile ; maximum number of landmarks for near bin ; maximum number of landmarks for middle bin ; maximum number of landmarks for far bin and weight for the vegetation landmarks . As described in the background section, rejecting outliers, , plays an important role in converging to minimum, the weight of outlier rejection thus has notable impact on the translation error. The landmarks are categorized into three bins, which also have great significance in translation error calculation. Trees that have a rich structure results in feature points that are good to track, but they can move. So, finding an optimal weight for vegetation can significantly reduce translation error. and range from 0 to 1, while , and range from 0 to 999. We have set these ranges based on early experimental results.
Our experiments show that adjusting the values of parameters did not decrease or increase the translation error in a linear or easily appreciable pattern. So, a simple hill climber will probably not do well in finding optimized parameters. We thus use a GA to optimize these parameters.
Algorithm 1 explains the combination of LIMO with the GA, which uses a population size of 50 runs for 50 generations. We used ranking selection [41] to select the parents for crossover and mutation. Rank selection probabilistically selects higher ranked (higher fitness) individuals. Unlike fitness proportional selection, ranking selection pays attention to the existence of a fitness difference rather than also to the magnitude of fitness difference. Children are generated using uniform crossover [42], which are then mutated using flip mutation [25]. Chromosomes are binary encoded with concatenated parameters. and are considered up to three decimal places, which means a step size of 0.001, because changes in values of parameters cause considerable change in translation error. All the parameters require 11 bits to represent their range of values, so we have a chromosome length of bits, with parameters arranged in the order: , , , , .
The algorithm starts with randomly generating a population of individuals. Each chromosome is sent to LIMO to evaluate. LIMO evaluates the parameter set represented by this individual by using those parameters to run on the KITTI dataset [13]. The KITTI benchmarks are well known and provide the most popular benchmark for Visual Odometry and VSLAM. This dataset has rural, urban scenes along with highway sequences and provides gray scale images, color images, LIDAR point clouds and their calibration. Most LIMO configurations are as in [7]. In our work, we focus on two sequences in particular: sequence 01 and 04. Sequence 01 is a highway scenario, which is challenging because only a road can be used for depth estimation. Sequence 04 is an urban scenario, which has a large number of landmarks for depth estimation. We consider both sequences for each GA evaluation because we want a common set of parameters that work well with multiple scenes.
The fitness of each chromosome is defined as the inverse of translation error. This translates the minimization of translation error into a maximization of fitness as required for GA optimization. Since each fitness evaluation takes significant time, an exhaustive search of the size search space in not possible, hence our using the GA. During a fitness evaluation, the GA first runs the LIMO with sequence 01. It then compares the LIMO estimated poses with ground truth (also found in [13]) and finds the translation error using the official KITTI metric [13]. The same steps are followed for sequence 04. The fitness value of each chromosome is the average of the inverse translation errors from the two sequences.
[TABLE]
Selected chromosomes (ranked selection) are then crossed over and mutated to create new chromosomes to form the next population. This starts the next GA iteration of evaluation, selection, crossover, and mutation. The whole system takes significant time since we are running LIMO evaluations to determine the best parameters. The next section shows our experiments with individual and combined sequences, with and without the GA. Our results show that the GA-LIMO performs better than the results of LIMO [7].
IV EXPERIMENT and RESULTS
In this section we show our experiments with individual KITTI sequences, a combination of sequences, and overall results. First, we run the GA-LIMO with sequences 01 and 04 separately. We show the translation error and the error mapped onto trajectory, compared to the ground truth (reference) [36]. We then, show our results when GA-LIMO runs with evaluations on both sequences 01 and 04. Finally, we compare the values of parameters found by GA-LIMO versus LIMO.
Figure 2 shows camera data while GA-LIMO is estimating the pose from that data in figure 3. Figure 4 compares LIMO performance with GA-LIMO on sequence 04. Here the GA was run on this sequence individually to find the optimal parameters, as in algorithm 2. Absolute Pose Error (APE) and Root Mean Squared Error (RMSE) are one of the important measures [43]. The translation error for each sequence is the RMSE calculated with respect to ground truth. Figure 4(a) compares the translation error over the poses, while figure 4(b) compares the error mapped onto the trajectory with the zoomed in trajectory. Table I compares the values of parameters for LIMO and GA-LIMO. Our results show that the GA-LIMO trajectory is closer to ground truth compared to LIMO. We found that the translation error was 0.56% with GA-LIMO, in contrast to 1.01% with LIMO.
Figure 5 compares the performance of LIMO with GA-LIMO when the system is run on just sequence 01. Here first, the GA was run on sequence 01 (Algorithm 2) and the optimal parameters were used to test the same sequence. Table II compares the original and GA found parameter values. Figure 5(a) compares translation error, while figure 5(b) shows the error mapped onto the trajectory for LIMO and GA-LIMO. As shown in the zoomed in figure 5(b), GA-LIMO is closer to the ground truth. The translation error for LIMO is found to be around 3.71% and 3.8% in case of GA-LIMO, with sequence 01. GA found parameters that did not out perform the original parameters, when GA-LIMO was run on just sequence 01.
We finally ran the system with both sequences 01 and 04 as described in Algorithm 1. The fitness of each evaluation is the average of translation errors of the sequences when run using the input parameters. The parameters found in GA-LIMO as shown in table III, were then tested on sequences sequences 00, 01 and 04, as shown in figure 6 and 7. It is evident that GA-LIMO performed better than LIMO in all three sequences. The zoomed in figures show a closer view on one part of the trajectories. GA-LIMO trajectories are closer to the ground truth and have lesser translation errors. GA-LIMO has a translation error of 5.13% with sequence 00, 3.59% with sequence 01 and 0.65% with sequence 04, in contrast with 5.77% with sequence 00, 3.71% with sequence 01 and 1.01% with sequence 04 using original parameters.
Our method helped to find common set of optimal parameters which works better and hence lead to better performance in different kinds of environments.
V DISCUSSION and FUTURE WORK
This paper shows results that demonstrated that the genetic algorithm can tune LIMO parameters to achieve better performance, reduced translation error, across a range of scenarios. We discussed existing work on VSLAM, presented an algorithm to integrate LIMO with GA to find LIMO parameters that robustly minimize translation error, and explained why a GA might be suitable for such optimization. Initial results had the assumption that GAs are a good fit for such parameter optimization, and our results show that the GA can find parameter values that lead to faster learning and better (or equal) performance. We thus provide further evidence that heuristic search as performed by genetic and other similar evolutionary computing algorithms are a viable computational tool for optimizing LIMO’s performance.
APPENDIX
Open source code for this paper is available on github: https://github.com/aralab-unr/LIMOWithGA. The parameters used in this paper: outlier rejection quantile ; maximum number of landmarks for near bin ; maximum number of landmarks for middle bin ; maximum number of landmarks for far bin ; and weight for the vegetation landmarks , corresponds to outlier_rejection_quantile, max_number_landmarks_near_bin, max_number_landmarks_middle_bin, max_number_land- marks_far_bin, shrubbery_weight, respectively in the code.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] D. Cremers, “Direct methods for 3d reconstruction and visual slam,” in 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA) . IEEE, 2017, pp. 34–38.
- 2[2] T. Taketomi, H. Uchiyama, and S. Ikeda, “Visual slam algorithms: A survey from 2010 to 2016,” IPSJ Transactions on Computer Vision and Applications , vol. 9, no. 1, p. 16, 2017.
- 3[3] A. Singandhupe and H. La, “A review of slam techniques and security in autonomous driving,” in IEEE International Conference on Robotic Computing (IRC) . Italy, 2019, pp. 602–607.
- 4[4] R. Hartley and A. Zisserman, Multiple view geometry in computer vision . Cambridge university press, 2003.
- 5[5] R. Szeliski, Computer vision: algorithms and applications . Springer Science & Business Media, 2010.
- 6[6] M. Sons, H. Lategahn, C. G. Keller, and C. Stiller, “Multi trajectory pose adjustment for life-long mapping,” in 2015 IEEE Intelligent Vehicles Symposium (IV) . IEEE, 2015, pp. 901–906.
- 7[7] J. Graeter, A. Wilczynski, and M. Lauer, “Limo: Lidar-monocular visual odometry,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . IEEE, 2018, pp. 7872–7879.
- 8[8] M. Buczko and V. Willert, “Flow-decoupled normalized reprojection error for visual odometry,” in 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC) . IEEE, 2016, pp. 1161–1167.
