TL;DR
This paper introduces a globally optimal method for estimating the vertical direction in Atlanta world scenes, reducing computational complexity by focusing solely on the vertical frame and employing novel bounds for convergence.
Contribution
It proposes a new vertical direction estimation approach using branch-and-bound with innovative bounds, avoiding prior knowledge of horizontal frames and improving efficiency.
Findings
Successfully estimates vertical direction in synthetic and real data
Guarantees global optimality with novel bounds
Handles increased horizontal frames efficiently
Abstract
In man-made environments, such as indoor and urban scenes, most of the objects and structures are organized in the form of orthogonal and parallel planes. These planes can be approximated by the Atlanta world assumption, in which the normals of planes can be represented by the Atlanta frames. Atlanta world assumption, which can be considered as a generalized Manhattan world assumption, has one vertical frame and multiple horizontal frames. Conventionally, given a set of inputs such as surface normals, the Atlanta frame estimation problem can be solved in one-time by branch-and-bound (BnB). However, the runtime of the BnB algorithm will increase greatly when the dimensionality (i.e., the number of horizontal frames) increases. In this paper, we estimate only the vertical direction instead of all Atlanta frames at once. Accordingly, we propose a vertical direction estimation method by…
| Methods | upper | lower | searching domain |
|---|---|---|---|
| RS | 3D cube (side=) | ||
| Exp-BnB | 2D square (side=) | ||
| Ste-circle-BnB | 2D square (side=) | ||
| Ste-square-BnB | 2D square (side=) | ||
| SCS-BnB | 2D rectangle() |
| Methods | median time(s) | median iteration |
|---|---|---|
| Exp-BnB | 0.134 | 816 |
| Ste-circle-BnB | 0.184 | 1010 |
| Ste-square-BnB | 4.256 | 478 |
| SCS-BnB | 4.500 | 675 |
| RANSAC() | 0.007 | 36 |
| RANSAC() | 0.013 | 72 |
| RANSAC() | 0.038 | 203 |
| RANSAC() | 0.344 | 1840 |
| Methods | median time(s) | iteration | median error(∘) |
|---|---|---|---|
| Exp-BnB | 1.378 | 239 | 1.167 |
| Ste-circle-BnB | 1.182 | 223 | 1.141 |
| Ste-square-BnB | 88.470 | 118 | 1.141 |
| SCS-BnB | 153.356 | 219 | 1.129 |
| RANSAC() | 0.941 | 113 | 1.140 |
| RANSAC() | 3.792 | 459 | 1.173 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Globally Optimal Vertical Direction Estimation
in Atlanta World
Yinlong Liu, Guang Chen and Alois Knoll Yinlong Liu and Alois Knoll are with the Department of Informatics, Technische Universität München, München, Germany, 85748.
E-mail: [email protected] and [email protected] Guang Chen is with Tongji University, Shanghai, China, and with Technische Universität München, München, Germany.
E-mail: [email protected] Manuscript received August 19, 20**; revised August 26, 20**.
Abstract
In man-made environments, such as indoor and urban scenes, most of the objects and structures are organized in the form of orthogonal and parallel planes. These planes can be approximated by the Atlanta world assumption, in which the normals of planes can be represented by the Atlanta frames. Atlanta world assumption, which can be considered as a generalized Manhattan world assumption, has one vertical frame and multiple horizontal frames. Conventionally, given a set of inputs such as surface normals, the Atlanta frame estimation problem can be solved in one-time by branch-and-bound (BnB). However, the runtime of the BnB algorithm will increase greatly when the dimensionality (i.e., the number of horizontal frames) increases. In this paper, we estimate only the vertical direction instead of all Atlanta frames at once. Accordingly, we propose a vertical direction estimation method by considering the relationship between the vertical frame and horizontal frames. Concretely, our approach employs a BnB algorithm to search the vertical direction guaranteeing global optimality without requiring prior knowledge of the number of Atlanta frames. Four novel bounds by mapping 3D-hemisphere to a 2D region are investigated to guarantee convergence. We verify the validity of the proposed method in various challenging synthetic and real-world data.
Index Terms:
global optimization, branch-and-bound, rotation search, imaging geometry.
1 Introduction
In man-made environments, scenes usually have structural forms (e.g., the layout of buildings and many indoor objects such as furniture), which can be represented by a set of parallel and orthogonal planes [1]. Atlanta world makes an assumption that the man-made scene can be modeled by a horizontal plane (e.g., ground plane) and many vertical planes (e.g., buildings and walls), then the normals of the planes, which are called world frames, can describe the scenes abstractly. In other words, one vertical frame and multiple horizontal frames could represent Atlanta world [2, 3]. Therefore, it is a crucial step to estimate these vertical and horizontal frame directions in computer vision applications, which is named Atlanta frame estimation [3, 4]. More specifically, structural world frame estimation could be utilized as key modules for various high-level vision applications such as scene understanding [5, 1] and SLAM [6, 7].
Mathematically, an orientation in 3D Euclidean space corresponds to a point in the 3D unit sphere (i.e., ). This means that the Atlanta frame estimation which estimates multiple orientations is a multiple-clustering (also multi-model fitting) problem in . There have been lots of general multiple-clustering algorithms [8, 9, 10] and some of them have been applied in structural world frame estimation [11, 12]. However, Atlanta frame estimation is not exactly the same as the general multiple-clustering problem. It has some special constraints that all horizontal frames are in a plane and the vertical frame is parallel to the normal of the plane. These special constraints reflect essential properties of the Atlanta world assumption. If these constraints are omitted, it will not only lead to a significant decrease in accuracy but also increase the dimensionality of the problem. Furthermore, most of the multiple-clustering algorithms cannot guarantee global optimality when there are lots of outliers in observations [13, 14]. Therefore, recent developments in structural world frame estimation highlight the imminent need for robust and globally optimal methods by considering the above special orthogonal constraints [15, 4].
Recently, Manhattan frame estimation [1], which is a special case of the Atlanta frame estimation, is solved efficiently by a branch-and-bound (BnB) method with the orthogonal constraints [15]. However, when the BnB method is extended to the Atlanta world [3, 4], two problems appear,
The algorithm requires the number of Atlanta frames to be specified, which can rarely be known in advance. Although an automatic method is proposed to estimate the number of horizontal directions in [4], if it is over- or under-estimated, the global optimum may not occur in the correct direction. 2. 2.
It will suffer the curse of dimensionality. There are a considerable number of horizontal directions, whose relationships are unknown which is different from Manhattan world assumption. Consequently, the dimensionality of the problem will increase with the number of horizontal frames, and thus the runtime of the BnB algorithm will increase greatly.
In this paper, we focus on estimating the unique vertical direction instead of all directions in Atlanta world at once. There are two advantages in comparison with the one-time solving all directions methods as follows:
More flexible. The vertical direction is unique in Atlanta frames, and we can estimate the vertical direction even though we don’t know the total number of the horizontal directions. Additionally, we can also estimate the vertical direction from some irregular Atlanta world scenes (e.g., cylindrical buildings in Atlanta, whose horizontal directions number ). 2. 2.
More efficient. Vertical direction estimation is solved in a closed two dimensional space , which is a low-dimensional problem. In other words, only estimating vertical direction can significantly avoid the curse of dimensionality in Atlanta world.
Furthermore, estimating the vertical direction first is always favorable to following operations in practical applications (e.g., scene classification [16], parsing indoor scenes [17] and point set registration [18]). Specially, it is also helpful for estimating other horizontal Atlanta frames, because given the vertical direction, all other horizontal directions will be in a plane, and estimating the other horizontal directions will be a one-dimensional clustering problem in angular space [4]. In other words, given the vertical direction in Atlanta world, it is easy to estimate other horizontal directions with or without knowing the number of horizontal frames [19, 4].
1.1 Related Work
There is a large body of literature concerned with structural world frame estimation [4, 1, 15, 20]. Since it is a clustering problem in with some orthogonal constraints, we first review the works that apply the classical clustering or fitting method. With the definition of Atlanta world, Expectation Maximization (EM) type algorithms, which are popular for solving the chicken-and-egg problems [21], are applied in direction estimation [2]. However, the EM-type algorithms are local methods and have no guarantee of the global optimality. Therefore, there is an evident risk of local minima, and their performances rely heavily on a good initialization [22]. Besides, the RANdom SAmple Consensus (RANSAC) [23, 24, 25] based multi-structure estimation algorithms (e.g., T-linkage [26] and J-linkage [27]) are applied in structural direction estimation [12, 11]. These RANSAC-type methods are fast, accurate and have the best performances in many cases, but the their solution is sub-optimal due to their obvious heuristic nature [4]. More recently, Straub et al. [1] propose a real-time capable inference algorithm by considering the orthogonal constraints, which uses an adaptive Markov-Chain Monte-Carlo sampling algorithm.
To assure global optimality, J. Bazin et al. propose globally optimal methods [4, 14, 13, 3, 15] by applying branch-and-bound algorithm to solve a consensus set maximization problem. The fundamental theory of these global methods is rotation search [28, 29]. Specifically, the problem is solved by combining Interval Analysis theory with BnB algorithm in [14]. By contrast, the method in [13] is a natural application of Hartley and Kahl’s rotation search theory in [28]. Furthermore, 2D-EGI (Extended Gaussian Image) and its integral image are applied in [15] to accelerate the calculation of the bounds in rotation search. Besides, rotation search theory is also extended to Atlanta frame estimation in [3, 4].
However, Atlanta world is more complex than Manhattan world geometrically, since it has more than three frames. Consequently, the globally searching method in [3] requires the number of horizontal directions to be hand-tuned according to the scene, which seems unrealistic in practical applications. Therefore, an automatic two-stage method (meta-BnB) is proposed in [4] to estimate the number of directions. Concretely, it first searches the vertical direction and the horizontal plane in , then it estimates the horizontal directions in one dimensional angle space. It is worth noting that the meta-BnB is also based on rotation search theory in . However, searching vertical directions is inherently optimized in , whose dimensionality is less than that of .
Since the rotation search theory is closely related to our work, we then briefly review the rotation search theory in computer vision filed. The rotation search theory has achieved great success in geometric vision problems, for example, point set registration [30, 31], camera calibration [32, 33] and relative pose estimation [34, 28]. Because of the great success of rotation search, there have been several works focusing on improving the efficiency of the algorithm [35, 36, 15, 31].
More specifically, most of the rotation search methods rely on the two following lemmas [28]:
Lemma 1**.**
For , , then
[TABLE]
where is the angle lying in the range of the rotation and denotes the angular distance between vectors.
Lemma 2**.**
For , then
[TABLE]
where and are their corresponding angle-axis representations. In Lemma 2, there is a clear indication that the angle distance of two rotations is less than the Euclidean distance in their angle-axis representation. These two lemmas are the basis for the success of the rotation search theory.
Additionally, it is also worth noting that the rotation search usually means optimization in , which is closely related to . Precisely, the homomorphism from a unit quaternion sphere (i.e., ) to is a two-to-one mapping, and then the searching domain may be expressed as a hemisphere (including equator) of the unit quaternion sphere [36, 37]. However, the estimation of directions in three-dimensional Euclidean space (i.e., Manhattan or Atlanta frame) is inherently optimized in . Unfortunately, there is still a lack of rigid theories regarding globally optimal optimization in . In order to estimate the vertical directions in Atlanta world, we originally propose some new and solid mathematical conclusions about searching in .
1.2 Our Contribution
In this paper, to overcome the curse of the dimensionality and avoid the difficulty of requiring the user to specify the number of Atlanta frames, we propose a novel method for vertical direction estimation in Atlanta world.The contributions of this work are mainly as follows:
- •
We propose a global searching method for estimating vertical direction, which is different from conventional rotation search in [4]. Since the domain of the vertical directions is inherently in , then our searching method is more efficient in vertical direction estimation.
- •
Four new different bounds for BnB algorithm are investigated. In contrast to rotation search theory in , more parametrizations for hemisphere are considered, including exponential mapping, stereographic projection and sphere coordinate system. To the best of our knowledge, it is the first to propose such bounds in to solve structural world frame estimation problem.
2 Methods
2.1 Problem Formulation
In this paper, we estimate the vertical direction from the surface normals in Atlanta world. We denote the input normal set as , where is the -th effective unit normal, and is the number of input normals. In addition, the unknown-but-sought vertical direction is denoted as . It is in a hemisphere (), which is defined as:
[TABLE]
where is a unit vector in . Accordingly, the angle of vertical direction and one of the surface normals is lying in range .
To estimate vertical direction robustly, we then apply the inlier maximisation approach to formulate the objective function as
[TABLE]
where is an indicator function which returns 1 if the condition · is true and 0 otherwise and is the logical OR operation. is abs function and is the inlier threshold. Eq. (4b), (4c) and (4d) mean that only the surface normals, which are parallel or perpendicular to vertical direction, are inliers. Additionally, because , and when , is a monotonically decreasing function, then an equivalent formulation can be given by
[TABLE]
Since there is no operation to solve angle in the reformulations, it is more efficient than operating angle inequations.
In rotation search [4], it finds an optimal rotated motion rather than the optimal direction vector directly. Concretely, given a initial direction vector and because , then . For estimating vertical direction, it is sufficient to search the entire rotation domain and find the optimal to satisfy that is the optimal vertical direction.
2.2 Branch-and-Bound
Finding the optimal to maximize the cardinality of the inlier set is by no mean a trivial problem [38, 39]. Additionaly, the outlier observations, which are unavoidable in the real applications, increase the “hardness” of the estimation problem. Because it is well known that a general robust estimation with outlier observations is an NP-hard problem [40].
To obtain the robust optimal vertical direction, we then use the BnB algorithm. The BnB algorithm is one of the most commonly used tools for solving NP-hard optimization problems, and it is widely applied in many global optimization problems [41]. Briefly, the BnB algorithm recursively divides the searching space into smaller spaces and estimates the upper bound and lower bound of the optimum in each subspace. Then, it removes the sub-spaces which cannot produce a better solution than the best one found so far by the algorithm. The above process is repeated until the best optimum is found within the desired accuracy. The BnB algorithm for estimating vertical direction globally in Atlanta world is outlined in Algorithm 1. It is worth noting that the algorithm only needs the surface normals and the inlier threshold as the inputs without the prior knowledge of the number of horizontal frames.
The key of the BnB algorithm is estimating the upper and lower bounds of the optimum in each subspace tightly and efficiently. In this paper, two general bounds are proposed as follows:
Proposition 1** (General bounds-1).**
Given a branch , if , , and then the upper bound can be:
[TABLE]
[TABLE]
[TABLE]
the lower bound can be:
[TABLE]
[TABLE]
Proof.
The rigorous proof can be found in appendix A. ∎
Proposition 2** (General bounds-2).**
Given a branch , if , , then the upper bound can be:
[TABLE]
[TABLE]
[TABLE]
where
[TABLE]
the lower bound can be:
[TABLE]
[TABLE]
Proof.
The completed proof can be found in appendix B. ∎
Actually, if they have the same in both general bounds, then . The main difference between general bounds-1 and general bounds-2 is the calculation of the upper bound. More specifically, given a subspace , , which means general bounds-1 is tighter than general bounds-2 (Rigorous mathematical proof can be found in appendix C). In the next sections, we introduce how to calculate the upper bound in detail.
2.3 *Parametrizing the Searching Domain *
Before estimating the bounds in BnB algorithm, we must first parametrize the searching space. In this section, we first recall the parametrization of in rotation search theory [28, 4], and introduce three different parametrizations of . Furthermore, we analysis the similarities and differences of the parametrizations between and .
2.3.1 Parametrization of
It is well known that rotation space can be minimally parametrized with the angle-axis vector, whose norm is the angle of rotation and direction is the axis of the rotation. Therefore, the space of all 3D rotations can be represented by a solid ball of radius in [30]. Furthermore, the -ball is usually relaxed to a 3D cube for ease of manipulation in the BnB algorithm. Thus Lemma 1 and 2 can be used to efficiently estimate the bounds of rotation search theory.
The Lemma 2 may seem like one of the most fundamental parts in rotation search theory. Let us get down to the details of Lemma 2, and introduce the quaternion to build the connection with the parametrization of . Geometrically, the mapping from quaternions () to rotations () is a two-to-one mapping. We then denote a hyper-hemisphere as follows:
[TABLE]
where is a unit vector in . Thus the “upper” hemisphere of the unit quaternion sphere is in one-to-one correspondence with the rotation -ball, except at the boundary, where the correspondence is two-to-one [28]. Therefore a conclusion follows as
Lemma 3**.**
For , then
[TABLE]
where are corresponding rotations of and . This lemma implies that the angle of two rotations is twice the angle between their corresponding quaternions.
Additionally, a unit quaternion can be represented by an angle-axis vector as follows:
[TABLE]
where is a unit vector representing the axis of the rotation, and is the angle of rotation. It is an exponential mapping from the upper quaternion hemisphere to the solid -ball. Therefore, there is an important inequation as follows:
Lemma 4**.**
For , then
[TABLE]
where and are angle-axis representations of and . The complete proofs of Lemma 3 and Lemma 4 can be found in [28] and [37].
According to Eq. (12) and (14), we can easily obtain Lemma 2. In other words, Lemma 2 is separated into two parts, and Lemma 4 inspires us to parametrize the by exponential mapping.
2.3.2 Parametrization of : Exponential Mapping
Geometrically, is a hemisphere in three-dimensional Euclidean space, and it is inherently a two-dimensional closed space. In order to parametrize minimally, we are inspired from Lemma 4 and propose an exponential mapping method to map the hemisphere to a 2D-disk.
Concretely, let , then it can be represented by a corresponding point in the disk,
[TABLE]
where , is a unit vector in and . Note that the domain of is corresponding to , and geometrically, is the radius of the disk. In BnB algorithm, a square (side=) circumscribing the mapped disk area is used as the vertical direction domain for ease of manipulation.
The mapping is similar to the mapping from to the 3D solid -ball. Similarly, the mapping is one-to-one except the boundary (i.e., the equator) where it is two-to-one. More specifically, the exponential mapping is closely related to Lie theory [42, 43]. However, in this paper we will not rely on any knowledge of the Lie groups theory without distracting readers’ attention and focus on the direction estimation problem.
Because of the similarity between Eq. (15) and Eq. (13), we then propose a similar inequation as follows:
Proposition 3**.**
For , then
[TABLE]
where are corresponding points of in the 2D disk.
Proof.
The complete proof is in the appendix D, and the visualization can be found in Fig. 1. ∎
The exponential parametrization obtains great success in and builds the foundation of Lemma 2. In this paper, we extend exponential mapping to and apply Proposition 3 as one of the fundamental parts in our globally optimal vertical estimation method.
2.3.3 Parametrization of : Stereographic Projection
In geometry, the stereographic projection is a particular mapping that could project a hemisphere to a disk in plane, which means we can also represent the minimally by applying stereographic projection.
The stereographic projection is described in Fig. 2. We denote a point in the equatorial plane and its corresponding point , and if the projection pole is at (South Pole)(see [44]), then we have:
[TABLE]
Clearly, we can parametrize minimally by stereographic projection. Similarly, a square (side=) circumscribing the mapped disk area is used as the vertical direction domain in the BnB algorithm. It worth noting that the stereographic projection was also applied to accelerate the calculation in rotation search [35], which inspires our work.
2.3.4 Parametrization of : Spherical Coordinate System
The can also be parameterized by spherical coordinate system (Wikipedia: “spherical coordinate system”). Geometrically, the hemisphere is flattened to a rectangle (see Fig. 3). In the BnB algorithm, the rectangle region can be set as the initial searching domain.
For and its corresponding point , the mapping from three-dimensional Cartesian coordinate system to spherical coordinate system is [45]
[TABLE]
where is the four-quadrant inverse tangent function111https://ww2.mathworks.cn/help/matlab/ref/atan2.html. Conversely [46],
[TABLE]
where and are azimuth angle and elevation angle, respectively.
In summary, the space of is parametrized and relaxed to a 3D cube, thus in BnB algorithm, the cube is recursively divided into eight sub-cubes. By contrast, the parametrizations of are both disks in exponential mapping and stereographic projection, after that the disk is relaxed to a solid square. In the BnB algorithm, we recursively subdivide it into four smaller squares and calculate the estimation of the upper bound and lower bound for the optimum in each sub-branch. Lastly, can be parameterized by a azimuth-elevation rectangle using spherical coordinate system. Similarly, in the BnB algorithm, the rectangle is recursively divided into four smaller rectangles.
For ease of understanding, we call the point in the solid disk/rectangle image-point, meanwhile we call its corresponding point preimage-point in .
2.4 Estimating Bounds
In this section, we show how to calculate the bounds with different parametrizations in detail.
2.4.1 Bounds of Rotation Search
We first recall the bounds applied in rotation search. According to Lemma 1 and Lemma 2,
[TABLE]
[TABLE]
Then, we have the following Lemma.
Lemma 5** (rotation uncertainty angle bound).**
Given a divided cube-shaped rotation branch , whose center is , half-side is . For ,
[TABLE]
where is matrix representation of . Let initial vertical direction , and . Then,
[TABLE]
Observe that it satisfies the conditions of Proposition 2: , , and .
Then, given a divided cube-shaped rotation branch , the bounds can be
[TABLE]
[TABLE]
Note that the bounds are widely used in many geometrical vision problems [4, 15], which are not our original contributions. Besides, it is worth noting that there seems a tighter bound than Eq. (26) in [31], however, to calculate the bound efficiently, it is based on two unproven assumptions.
2.4.2 Bounds Using Exponential Mapping
According to Proposition 3, we have,
Proposition 4**.**
Given a divided square-shaped branch in exponential mapping plane, whose center is , and half-side is . For ,
[TABLE]
where are preimage points of and .
Proof.
This proposition can be derived as follows:
[TABLE]
which follows Proposition 3 (see Fig. 1). ∎
Proposition 4 and Lemma 5 have similar formulations. However, to the best of our knowledge, it is the first time Proposition 4 has been explicitly introduced to the computer vision field. Obviously, given divided square-shaped branch in exponential mapping plane, according to Proposition 2, the bounds can be:
[TABLE]
[TABLE]
2.4.3 Bounds Using Stereographic Projection
Stereographic projection has a crucial property that circles are projected as circles (circle preserving [35, 47]). We use this property to calculate the first bound based on stereographic projection.
Proposition 5**.**
Given a divided square-shaped branch in stereographic projection plane, and its circumscribed circle is . The preimage of is in , whose radius is and the direction of its center point is ; , is its preimage-point,
[TABLE]
Proof.
Because , then its preimage-point . The angle of and must be no greater than the maximum angle (see Fig. 4). ∎
We then explain how to calculate and in detail. Given a divided square-shaped branch in mapped plane, its four vertexes () must be in the edge of circumscribed circle (see Fig. 4). Then the* preimage-points* of the vertexes () must be in the edge of , and the edge of is a circle. The direction of the center point is perpendicular to the plane crossing the circle. Hence, is perpendicular to any vector in the circle-plane, which means and . Let . Then, and .
Intuitively, Proposition 5 shows that a divided square-shaped branch in the stereographic projection plane is relaxed to a circle, meanwhile the corresponding domain in the 3D sphere is also relaxed to a umbrella-shaped patch surrounded by a circle, whose radius is .
Given a divided square-shaped branch in stereographic projection plane, according to Proposition 2, the bounds can be
[TABLE]
[TABLE]
[TABLE]
The bounds (Eq. (35) and Eq. (36)) are called circle-bounds using stereographic projection as the divided square is relaxed to its circumscribed circle.
2.4.4 Tighter Bounds Using Stereographic Projection
For the stereographic projection, a tighter bound can be found without relaxing the divided square, and therefore, it does not apply the circle-preserving property.
Given a divided square-shaped branch in stereographic projection plane, the preimage of its center is . is the preimage-point of .
[TABLE]
Considering the Proposition 1, the bounds can be
[TABLE]
[TABLE]
where .
The detailed implementation for calculating the and can be found in appendix E. Note that the bounds are tighter than circle-bounds using stereographic projection. The reason is simple that a divided square is relaxed to a circle in the circle-bounds but no relaxation in the tighter bounds. Additionally, because the bounds are based on the divided square, then we call the tighter bounds square-bounds using stereographic projection.
2.4.5 Bounds Using Sphere Coordinate System
In this part, we introduce the upper and lower bounds using sphere coordinate system according to Proposition 1.
Given a divided rectangle-shaped branch in azimuth-elevation rectangle, the preimage of its center is . is the preimage-point of .
[TABLE]
Then, the bounds can be
[TABLE]
[TABLE]
where .
The implementation for calculating and can be found in appendix F.
2.4.6 Comparison of the Bounds
It is well known that the success of a BnB algorithm is mainly predicated on the quality of its bounds. To show the relaxation and the tightness, in this section, we compare these bounds (Table I) geometrically.
Bounds of rotation search. The searching domain is parametrized as a 3D cube. In BnB, for each divided sub-cube, it is first relaxed to its circumscribed ball and then relaxed to a region in quaternion sphere (Lemma 2). Lastly, it is relaxed to a spherical patch in using Lemma 1.
Bounds using exponential mapping(exp bounds). Similarly, the searching domain is parametrized as a 2D square. The divided sub-square is first relaxed to its circumscribed circle and then relaxed to a spherical patch in (Proposition 4). Therefore, it has a two-step geometrical relaxation.
Circle-bounds using stereographic projection (ste-cirlce bounds). The searching domain is parametrized as a 2D square. The divided sub-square is first relaxed to its circumscribed circle, which is corresponding to a spherical patch in (circle preserving). In geometric, it has only one relaxation processing.
Square-bounds using stereographic projection (ste-square bounds). The searching domain is the same as that of the ste-circle bounds, however, the ste-square bounds have no geometrical relaxations.
Bounds using sphere coordinate system (SCS bounds). The searching domain is parametrized as a 2D azimuth-elevation rectangle, which leads to significant distortions. Nonetheless, they have no geometrical relaxations.
Note that what we say about geometrical relaxation is only for one specific input. There is a relaxation for the objective, which relaxes the connections among the inputs. In other words, for a large branch, it hardly obtains the upper bound simultaneously for all inputs.
Computational efficiency. The exp-bounds and the ste-circle bounds are calculated more efficiently than the ste-square bounds and the scs-bounds. This is because that to estimate and , it is needed to calculating the angle range between the and four edges of the branch in the ste-square bounds and the scs-bounds. However, given a branch , all share the same in the exp-bounds and the ste-circle bounds.
3 Experiments
In this section, we verify the validity of the proposed method on challenging synthetic and real-world data. Firstly, we compared our proposed methods with RANSAC and rotation search method to show robustness and efficiency. Then, full Atlanta frame estimation experiments were conducted to verify that estimating vertical direction was helpful to estimating all Atlanta frames. Lastly, we tested proposed methods in two real-world datasets to verify the practicality. All methods were implemented222https://github.com/Liu-Yinlong/Globally-optimal-vertical-direction-estimation-in-Atlanta-world in Matlab 2019a and executed on an AMD Ryzen 7 2700X 3.7GHz CPU.
3.1 Experimental Setting
The settings of approaches/pipelines run on experiments were as follows:
- •
RANSAC: The number of minimal sample subsets was 2. It could get three directions from two inlier-inputs (two inlier directions and its cross product direction), and one of them might be the vertical direction. Besides, the confidence level was used for the stopping criterion[23]. The number of iterations was typically taken as
[TABLE]
where was the outlier proportion, returned the nearest integer greater than or equal to the input.
- •
RS: Algorithm 1 with the rotation search bounds. Note that the bounds were also used in meta-BnB in [4]. We did not use the Extended Gaussian Image (EGI) and its integral image [4, 15], because we focused on the geometry and the validity of the proposed bounds. There might be more efficient bounds calculation methods for the proposed bounds but it is out of the scope of this paper.
- •
Exp-BnB: Algorithm 1 with the proposed bounds using exponential mapping.
- •
Ste-circle-BnB: Algorithm 1 with the proposed circle-bounds using stereographic projection.
- •
Ste-square-BnB: Algorithm 1 with the proposed square-bounds using stereographic projection.
- •
SCS-BnB: Algorithm 1 with the proposed bounds using sphere coordinate system.
In addition, to simulate the corrupted inputs in the synthetic experiments, noise and outlier were added. For noise, was the -th random vector, whose elements were randomly uniformly distributed in the interval . The noise was simulated by
[TABLE]
where was the amplitude of noise. For outliers, random orientations were added into the inputs. The total number of inputs was denoted and the number of outlier inputs was denoted , then was the outlier proportion.
3.2 Synthetic Data Experiments
3.2.1 Synthetic Atlanta World
To simulate synthetic Atlanta world data, a random orientation was generated as the vertical direction (). Except where otherwise specified, 20% inlier inputs were parallel to vertical direction, and the other 80% inlier inputs were randomly generated to be perpendicular to the vertical direction and thus in the ”horizontal plane”. Note that the number of the horizontal frames were not specified. The inlier threshold was according to the noise level in all the synthetic experiments. Once the vertical direction was estimated as , the error was calculated by
[TABLE]
To evaluate the results of the experiment, the vertical error and runtime were recorded. Additionally, because the iteration of the BnB algorithm reflected the tightness of the bounds, the iterations of BnB algorithm with different bounds were also recorded. Moreover, to reduce the randomness, 500 trials were repeated in each setting.
Controlled experiments. We first tested all the methods with different outlier ratios and different noise levels . The number of input was . The results are shown in Fig. 5. From the results, we can draw the following conclusions:
- •
All the four types of bounds in and the bounds of rotation search could be nested into the BnB algorithm to estimate the vertical direction globally in Atlanta world.
- •
The four bounds in had different efficiency. Nevertheless, the proposed bounds in were more efficient than the bounds of rotation search.
- •
Broadly, the exp-BnB and the ste-circle-BnB had similar efficiency. The ste-square-BnB and the SCS-BnB had similar efficiency. More specifically, the first two were more efficient than the last two.
- •
Generally, the ste-square-BnB and the SCS-BnB had fewer iterations than the exp-BnB and the ste-circle-BnB. It revealed the ste-square bounds and the SCS bounds were tighter, which was consistent with the previous theoretical analysis.
There are three main reasons why rotation search is rather inefficient for vertical direction estimation.
Multiple solutions. Since , if the initial direction and the optimal vertical direction are fixed, there are numerous solutions for [48]. In other words, if is correct, then is also correct, where is a rotation about axis . Therefore, all are solutions. For the BnB algorithm, if there are multiple solutions, there are many branches which are very close to one of the solutions and their objective value are also close to the ground truth, then the BnB algorithm must spend lots of time pruning the branches. 2. 2.
Higher dimensionality. Since the vertical direction is inherently a two dimensional problem, searching in higher dimension leads to lower efficiency. 3. 3.
Conservative bound. Since rotation search bounds have a three-step geometrical relaxation, the bounds are relatively conservative.
Furthermore, why exp-BnB and ste-circle-BnB algorithms had more iterations while they still run faster? This was because on one hand, tighter bounds would remove more aggressively and yield fewer iterations. However, on the other hand, using tighter bounds in BnB might be counter-productive if calculating the bound itself took significant time.
Challenging experiments. We conducted more experiments on challenging data. In this part, we only tested the bounds in , as the bounds of rotation search were obviously less efficient. The number of input was fixed . First, all methods were tested on different high outlier ratios and different noise levels . The results are shown in Fig. 6. Second, all methods were tested on different large noise levels and different outlier ratios . The results are shown in Fig. 7. From the all the results, we can draw the following conclusions:
- •
The exp-BnB had the highest efficiency among all BnB-based methods in such experimental settings. It is worth noting that in large outlier ratio cases (), the exp-BnB algorithm even had comparable efficiency with RANSAC.
- •
The ste-square-BnB had the least iterations among all BnB-based methods, which showed the bounds were very tight.
Theoretically, both ste-square-bounds and SCS-bounds have no geometrical relaxations, then why SCS-bounds needed more iterations than ste-square-bounds in the challenging experiments? This was due to the large distortion of the searching domain. For example, the domain near the optimal direction in might be expanded to a scale-up region in azimuth-elevation rectangle, therefore, the BnB algorithm needed more iterations to prune the near-optimal branches.
3.2.2 Full Atlanta frame estimation
In this part, we verified the performance of our proposed bounds in full Atlanta frame estimation problem. For the sake of fairness, the experiments were conducted on synthetic Manhattan world and the rotation search method was from [13] without EGI-acceleration. Our proposed methods first estimated the vertical frame direction, and then estimated the horizontal frames by a one-dimension clustering method, which can be called sequential methods (see appendix G for more details).
To generate the input normals in Manhattan world, we randomly selected a point in as the Manhattan frames. In other words, each column of was corresponding to a Manhattan frame. The experimental settings were and was from 0.1 to 0.6. Once the frame directions had been estimated as , the estimation error was measured by
[TABLE]
where was average function; was element-wise arccosine function; was column-wise max function. It computed the average error of the three frames. Note that the solution of rotation search method inherently satisfies the constraint, while the solutions of our sequential methods were built without this constraint, as they were formulated for general Atlanta frame estimation.
The results are in Fig. 8. The accuracy of rotation search method was slightly better than that of the sequential methods in large noise level. This was because the rotation search method considered the three orthogonal constraints. Nevertheless, the runtime of rotation search was much longer than that of the sequential methods, due to the fact that the sequential methods had lower dimensionality, tighter bounds and fewer iterations in the BnB framework.
3.3 Real Data Experiments
3.3.1 NYUv2 Data
We tested our method on the NYUv2 Dataset[49], which contained 1449 RGB images, along with the corresponding depths, as well as camera information. The data involved a variety of indoor scenes that were considered to be man-made structural world. In our experiments, we utilized the data to estimate the vertical direction of the scenes. Concretely, we generated the normals from the depth image by the Matlab built-in function pcnormals and estimated the vertical direction from the downsampled normal data () for all scenes. The threshold was set to in all methods. For RANSAC, were tested since the ground truth of the outlier ratio of each scene was unknown, and the sample iteration was , which was determined by (Eq.(45)).
The distribution of error (, see Eq.(47)) is shown in Fig.9. The results revealed that the estimation errors of the BnB algorithms were all concentrated at and . Because there were some degenerate scenes in the data set, which were degenerated into Manhattan assumption, or even worse, only two main orthogonal frames (Fig.11). Estimating vertical direction in such degenerate scenes might return a frame-direction in horizontal plane. Consequently, some errors were concentrated at . Furthermore, when the outlier proportion was set low, the estimation errors of RANSAC algorithm were not concentrated. When the outlier proportion was set high, the errors were also concentrated. Moreover, to demonstrate the results quantitatively, the -recall curve was presented in Fig. 10, where the success case was satisfied or .
Furthermore, the four bounds in had different efficiency. Specifically, the distribution of iteration and runtime in NYUv2 data are in Fig. 12. More specifically, the median runtime and iteration can be found in Table II. Obviously, the exp-BnB algorithm was the most efficient. On the other hand, RANSAC ran very fast when the outlier proportion was set low, however, it might return incorrect results. If the outlier proportion was set high (), its runtime was longer than that of the exp-BnB algorithm. Besides, to compare with rotation search, we directly quote the results from [15]. With rotation search bounds, it needs 117.06s averagely to estimate Manhattan frames for each scene without input sampling. However, with an efficient bounds computation method proposed in [15], it needs only 0.07s on average.
3.3.2 Outdoor Data
In this part, we verified the validity of our methods with the outdoor scene. The data set [50] was recorded in the old town of Bremen, Germany (see Fig.(13)). It contained 13 3D scans, each with up to 22,500,000 points. Estimating the vertical direction first might be useful to register the scenes [18]. For each scene, it was considered as an Atlanta world and was set as the ground truth of vertical direction roughly. We firstly down-sampled the inputs using Matlab built-in function pcdownsample. More specifically, a box grid filter, whose input gridStep was 0.25, was used to reduce the inputs (). After that their normals were computed by pcnormals, and lastly the vertical direction was estimated from the obtained normals. were set in RANSAC and inlier threshold for all methods in this experiments.
The results can be found in Table III (see appendix H for each scene results). Note that the ground truth for vertical direction was roughly set, and the errors were only indicating that the vertical direction estimation results were roughly correct. In this outdoor settings, all bounds in can be nested into the BnB algorithm to globally estimate the vertical direction. Furthermore, the results showed that exp-BnB and ste-circle-BnB algorithm had similar performance and were more efficient. Ste-square-BnB had the least iterations among all the methods, however, it needed more time to calculate the bounds. Furthermore, SCS-BnB algorithm needed much more time to estimate the vertical direction in this experiments. Note that RANSAC could also obtain similar results in this experimental settings. Besides, rotation search method could not terminate in 1800s (30min) in each scene. However, according to the results in [4], with the help of accelerating method, it takes about 80s to estimate Atlanta frames in the whole scene.
4 Conclusion
In this paper, we propose a novel method for estimating the vertical direction in Atlanta world. It can get the globally optimal solution by applying BnB algorithm, without requiring any prior knowledge of the number of frames. Since estimating vertical direction is inherently a two dimensional problem, we propose new bounds in for BnB which are different from the conventional bounds in rotation search.
The experimental results show that all the bounds (in or ) can be nested inside the BnB algorithm to obtain the global solution, and the bounds in outperform the bounds in , which is the state-of-the-art technique, for estimating vertical direction globally. Furthermore, these four bounds in have different performance. Generally, exp-BnB and ste-circle-BnB have similar performance and are more efficient. Moreover, although ste-square-BnB and SCS-BnB have tighter bounds, they are rather inefficient because of the heavy computational burden. In addition to the quality of the bounds, appropriate parametrization of searching domain is also an important factor of the efficiency of the BnB algorithm. This is why ste-square-BnB is more efficient than SCS-BnB algorithm.
Lastly, since the ste-square-BnB has the least iterations, there may be a hope to accelerate the calculation of the ste-square bounds to obtain a faster BnB algorithm in further work. In addition, since the ste-square bounds are very tight in according to the experimental results, similarly, there may be tighter provable bounds in rotation search () [30, 31].
Acknowledgments
The research leading to these results has partially received funding from the European Unions Horizon 2020 Research and Innovation Program under Grant Agreement No. 785907 (HBP SGA2), from the program of Tongji Hundred Talent Research Professor 2018, and from the Shanghai AI Innovative Development Project 2018. Yinlong, Liu is funded by Chinese Scholarship Council (CSC).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] J. Straub, O. Freifeld, G. Rosman, J. J. Leonard, and J. W. Fisher, “The manhattan frame model—manhattan world inference in the space of surface normals,” IEEE transactions on pattern analysis and machine intelligence , vol. 40, no. 1, pp. 235–249, 2018.
- 2[2] G. Schindler and F. Dellaert, “Atlanta world: An expectation maximization framework for simultaneous low-level edge grouping and camera calibration in complex man-made environments,” in Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004. , vol. 1. IEEE, 2004, pp. I–I.
- 3[3] K. Joo, T.-H. Oh, I. So Kweon, and J.-C. Bazin, “Globally optimal inlier set maximization for atlanta frame estimation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2018, pp. 5726–5734.
- 4[4] K. Joo, T.-H. Oh, I. S. Kweon, and J.-C. Bazin, “Globally optimal inlier set maximization for atlanta world understanding,” IEEE transactions on pattern analysis and machine intelligence , 2019.
- 5[5] V. Hedau, D. Hoiem, and D. Forsyth, “Recovering the spatial layout of cluttered rooms,” in 2009 IEEE 12th international conference on computer vision . IEEE, 2009, pp. 1849–1856.
- 6[6] N. Sünderhauf and P. Protzel, “Switchable constraints for robust pose graph slam,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems . IEEE, 2012, pp. 1879–1884.
- 7[7] H. Zhou, D. Zou, L. Pei, R. Ying, P. Liu, and W. Yu, “Structslam: Visual slam with building structure lines,” IEEE Transactions on Vehicular Technology , vol. 64, no. 4, pp. 1364–1375, 2015.
- 8[8] L. Magri and A. Fusiello, “Multiple model fitting as a set coverage problem,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2016, pp. 3318–3326.
