An Efficient Algebraic Solution to the Perspective-Three-Point Problem
Tong Ke, Stergios Roumeliotis

TL;DR
This paper introduces a new algebraic method for solving the perspective-3-point problem, enabling accurate and robust camera pose estimation with lower computational cost by directly deriving attitude and position from geometric constraints.
Contribution
The paper presents an algebraic approach that directly computes camera attitude and position from three reference points, improving accuracy and efficiency over previous methods.
Findings
Higher numerical accuracy demonstrated through simulations
Reduced computational cost compared to existing methods
Robust performance in near-singular configurations
Abstract
In this work, we present an algebraic solution to the classical perspective-3-point (P3P) problem for determining the position and attitude of a camera from observations of three known reference points. In contrast to previous approaches, we first directly determine the camera's attitude by employing the corresponding geometric constraints to formulate a system of trigonometric equations. This is then efficiently solved, following an algebraic approach, to determine the unknown rotation matrix and subsequently the camera's position. As compared to recent alternatives, our method avoids computing unnecessary (and potentially numerically unstable) intermediate results, and thus achieves higher numerical accuracy and robustness at a lower computational cost. These benefits are validated through extensive Monte-Carlo simulations for both nominal and close-to-singular geometric…
| position | orientation | |
|---|---|---|
| Kneip’s method | 1.18E-05 | 1.02E-05 |
| Masselli’s method | 1.84E-08 | 4.89E-10 |
| Proposed method | 1.66E-10 | 5.30E-12 |
| Proposed method+Root polishing | 5.07E-11 | 1.53E-13 |
| position | orientation | |
|---|---|---|
| Kneip’s method | 1.42E-14 | 1.34E-14 |
| Masselli’s method | 7.13E-15 | 6.15E-15 |
| Proposed method | 5.16E-15 | 3.73E-15 |
| position | orientation | |
|---|---|---|
| Kneip’s method | 8.10E-14 | 8.85E-14 |
| Masselli’s method | 7.24E-14 | 6.07E-14 |
| Proposed method | 6.73E-14 | 1.75E-14 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Inertial Sensor and Navigation · Satellite Image Processing and Photogrammetry
An Efficient Algebraic Solution to the Perspective-Three-Point Problem
Tong Ke
Stergios Roumeliotis
University of Minnesota
Minneapolis, MN 55455
University of Minnesota
Minneapolis, MN 55455
Abstract
In this work, we present an algebraic solution to the classical perspective-3-point (P3P) problem for determining the position and attitude of a camera from observations of three known reference points. In contrast to previous approaches, we first directly determine the camera’s attitude by employing the corresponding geometric constraints to formulate a system of trigonometric equations. This is then efficiently solved, following an algebraic approach, to determine the unknown rotation matrix and subsequently the camera’s position. As compared to recent alternatives, our method avoids computing unnecessary (and potentially numerically unstable) intermediate results, and thus achieves higher numerical accuracy and robustness at a lower computational cost. These benefits are validated through extensive Monte-Carlo simulations for both nominal and close-to-singular geometric configurations.
1 Introduction
The Perspective-n-Point (PnP) is the problem of determining the 3D position and orientation (pose) of a camera from observations of known point features. The PnP is typically formulated and solved linearly by employing lifting (e.g., [1]), or as a nonlinear least-squares problem minimized iteratively (e.g., [10]) or directly (e.g., [12]). The minimal case of the PnP (for n=3) is often used in practice, in conjunction with RANSAC, for removing outliers [5].
The first solution to the P3P problem was given by Grunert [9] in 1841. Since then, several methods have been introduced, some of which [9, 4, 19, 5, 17, 8] were reviewed and compared, in terms of numerical accuracy, by Haralick et al. [11]. Common to these algorithms is that they employ the law of cosines to formulate a system of three quadratic equations in the features’ distances from the camera. They differ, however, in the elimination process followed for arriving at a univariate polynomial. Later on, Quan and Lan [21] and more recently Gao et al. [6] employed the same formulation but instead used the Sylvester resultant [3] and Wu-Ritz’s zero-decomposition method [23], respectively, to solve the resulting system of equations, and, in the case of [6], to determine the number of real solutions. Regardless of the approach followed, once the feature’s distances have been computed, finding the camera’s orientation, expressed as a unit quaternion [13] or a rotation matrix [14], often requires computing the eigenvectors of a matrix (e.g., [21]) or performing singular value decomposition (SVD) of a matrix (e.g., [6]), respectively, both of which are time-consuming. Furthermore, numerical error propagation from the computed distances to the rotation matrix significantly reduces the accuracy of the computed pose estimates.
To the best of our knowledge, the first method111Nister and Stewenius [20] also follow a geometric approach for solving the generalized P3P resulting into an octic univariate polynomial whose odd monomials vanish for the case of the central P3P. that does not employ the law of cosines in its P3P problem formulation is that of Kneip et al. [15], and later on that of Masselli and Zell [18]. Specifically, [15] and [18] follow a geometric approach for avoiding computing the features’ distances and instead directly solve for the camera’s pose. In both cases, however, several intermediate terms (e.g., tangents and cotangents of certain angles) need to be computed, which negatively affect the speed and numerical precision of the resulting algorithms.
Similar to [15] and [18], our proposed approach does not require first computing the features’ distances. Differently though, in our derivation, we first eliminate the camera’s position and the features’ distances to result into a system of three equations involving only the camera’s orientation. Then, we follow an algebraic process for successively eliminating two of the unknown 3-dof and arriving into a quartic polynomial. Our algorithm (summarized in Alg. 1) requires fewer operations and involves simpler and numerically more stable expressions, as compared to either [15] or [18], and thus performs better in terms of efficiency, accuracy, and robustness. Specifically, the main advantages of our approach are:
- •
Our algorithm’s implementation takes about 40% of the time required by the current state of the art [15]. 222Although Masselli and Zell [18] claim that their algorithm runs faster than Kneip et al.’s [15], our results (see Section 3) show the opposite to be true (by a small margin). The reason we arrive at a different conclusion is that our simulation randomly generates a new geometric configuration for each run, while Masselli employs only one configuration during their entire simulation, in which they save time due to caching.
- •
Our method achieves better accuracy than [15, 18] under nominal conditions. Moreover, we are able to further improve the numerical precision by applying root polishing to the solutions of the quartic polynomial while remaining faster than [15, 18].
- •
Our algorithm is more robust than [15, 18] when considering close-to-singular configurations (the three points are almost collinear or very close to each other).
The remaining of this paper is structured as follows. Section 2 presents the definition of the P3P problem, as well as our derivations for estimating first the orientation and then the position of the camera. In Section 3, we assess the performance of our approach against [15] and [18] in simulation for both nominal and singular configurations. Finally, we conclude our work in Section 4.
2 Problem Formulation and Solution
2.1 Problem Definition
Given the positions, , of three known features , with respect to a reference frame , and the corresponding unit-vector, bearing measurements, , , our objective is to estimate the position, , and orientation, i.e., the rotation matrix , of the camera .
2.2 Solving for the orientation
From the geometry of the problem (see Fig. 1), we have (for ):
[TABLE]
where is the distance between the camera and the feature .
In order to eliminate the unknown camera position, , and feature distance, , we subtract pairwise the three equations corresponding to (1) for and , and project them on the vector to yield the following system of 3 equations in the unknown rotation :
[TABLE]
Next, and in order to compute one of the 3 unknown degrees of rotational freedom, we introduce the following factorization of :
[TABLE]
where333 denotes the rotation matrix describing the rotation about the unit vector, , by an angle . Note that in the ensuing derivations, all rotation angles are defined using the left-hand rule.
[TABLE]
Substituting (5) in (2), yields a scalar equation in the unknown :
[TABLE]
which we solve by employing Rodrigues’ rotation formula [16]:444 denotes the skew-symmetric matrix corresponding to such that , . Note also that if is a unit vector, then , while for two vectors , , . Lastly, it is easy to show that
[TABLE]
to get
[TABLE]
Note that we only need to consider one of these two solutions [in our case, we select ; see Fig. 2], since the other one will result in the same (see Appendix 5.2 for a formal proof).
In what follows, we describe the process for eliminating from (3) and (4), and eventually arriving into a quartic polynomial involving a trigonometric function of . To do so, we once again substitute in (3) and (4) the factorization of defined in to get (for ):
[TABLE]
where
[TABLE]
and employ the following property of rotation matrices
[TABLE]
to rewrite (10) in a simpler form as
[TABLE]
where
[TABLE]
The last equality in (13) is geometrically depicted in Fig. 2 and algebraically derived in Appendix 5.1. Analogously, it is straightforward to show that
[TABLE]
Next, by employing Rodrigues’ rotation formula [see (8)], for expressing the product of a rotation matrix and a vector as a linear function of the unknown , i.e.,
[TABLE]
in (12) yields (for ):
[TABLE]
Expanding (15) and rearranging terms, yields (for )
[TABLE]
Notice that the term appears three times in (2.2), and
[TABLE]
This motivates to rewrite (12) as (for ):
[TABLE]
where
[TABLE]
To simplify the equation analogous to (2.2) that will result from (18) [instead of (2.2)], we seek to find a (not necessarily unique) such that , and hence, [see (17)], i.e.,
[TABLE]
[TABLE]
where
[TABLE]
and thus [from (19) using (8)]
[TABLE]
Now, we can expand (18) using (14) to get an equation analogous to (2.2):
[TABLE]
Substituting [see (17)] in (24) and renaming terms, yields (for ):
[TABLE]
where555The simplified expressions for the following terms, shown after the second equality, require lengthy algebraic derivations which we omit due to space limitations.
[TABLE]
For , (25) results into the following system:
[TABLE]
Note that since , we can further simplify (26) by introducing , where
[TABLE]
Replacing by in (26), we have
[TABLE]
where
[TABLE]
From (28), we have
[TABLE]
Computing the norm of both sides of (2.2), results in
[TABLE]
which is a 4th-order polynomial in that can be compactly written as:
[TABLE]
with
[TABLE]
We compute the roots of (38) in closed form to find . Similarly to [15] and [18], we employ Ferrari’s method [2] to attain the resolvent cubic of (38), which is subsequently solved by Cardano’s formula [2]. Once the (up to) four real solutions of (38) have been determined, an optional step is to apply root polishing following Newton’s method, which improves accuracy for minimal increase in the processing cost (see Section 3.2). Regardless, for each solution of , we will have two possible solutions for , i.e.,
[TABLE]
which, in general, will result in two different solutions for . Note though that only one of them is valid if we use the fact that (see Appendix 5.3).
Next, for each pair of , we compute and from (2.2), which can be written as
[TABLE]
Lastly, instead of first computing from (19) and from (27) to find using (5), we hereafter describe a faster method for recovering . Specifically, from (5), (12) and (18), we have
[TABLE]
Since is perpendicular to , we can construct a rotation matrix such that
[TABLE]
and hence
[TABLE]
where
[TABLE]
Substituting (54) in (53), we have
[TABLE]
where
[TABLE]
The advantages of (55) are: (i) The matrix product can be computed analytically; (ii) are invariant to the (up to) four possible solutions and thus, we only need to construct them once.
2.3 Solving for the position
Substituting in (1) the expression for from (62) and rearranging terms, yields
[TABLE]
Note that we only use (1) for to compute from . Alternatively, we could find the position using a least-squares approach based on (1) for (see Appendix 5.4), if we care more for accuracy than speed. Lastly, the proposed P3P solution is summarized in Alg. 1.
3 Experimental results
Our algorithm is implemented666Our code is submitted along with the paper as supplemental material. in C++ using the same linear algebra library, TooN [22], as [15]. We employ simulation data to test our code and compare it to the solutions of [15] and [18]. For each single P3P problem, we randomly generate three 3D landmarks, which are uniformly distributed in a cuboid centered around the origin. The position of the camera is , and its orientation is .
3.1 Numerical accuracy
We generate simulation data without adding any noise or rounding error to the bearing measurements, and run all three algorithms on 50,000 randomly-generated configurations to assess their numerical accuracy. Note that the position error is computed as the norm of the difference between the estimate and the ground truth of . As for the orientation error, we compute the rotation matrix that transforms the estimated to the true one, convert it to the equivalent axis-angle representation, and use the absolute value of the angle as the error. Since there are multiple solutions to a P3P problem, we compute the errors for all of them and pick the smallest one (i.e., the root closest to the true solution).
The distributions and the means of the position and orientation errors are depicted in Fig.s 3 - 4 and Table 1. As evident, we get similar results to those presented in [18] for Kneip et al.’s [15] and Masselli and Zell’s methods [18], while our approach outperforms both of them by two orders of magnitude in terms of accuracy. This can be attributed to the fact that our algorithm requires fewer operations and thus exhibits lower numerical-error propagation.
Furthermore, and as shown in the results of Table 1, we can further improve the numerical precision by applying root polishing. Typically, two iterations of Newton’s method [24] lead to significantly better results, especially for the orientation, while taking only 0.01 s per iteration, or about 4% of the total processing time.
3.2 Processing cost
We use a test program that solves 100,000 randomly generated P3P problems and calculates the total execution time to evaluate the computational cost of the three algorithms considered. We run it on a 2.0 GHz4 Core laptop and the results show that our code takes 0.54 s on average (0.52 s without root polishing) while [15] and [18] take 1.3 s and 1.5 s, respectively. This corresponds to a 2.5 speed up (or 40% of the time of [15]). Note also, in contrast to what is reported in [18], Masselli’s method is actually slower than Kneip’s. As mentioned earlier, Masselli’s results in [18] are based on 1,000 runs of the same features’ configuration, and take advantage of data caching to outperform Kneip.
3.3 Robustness
There are two typical singular cases that lead to infinite solutions in the P3P problem:
- •
Singular case 1: The 3 landmarks are collinear.
- •
Singular case 2: Any two of the 3 bearing measurements coincide.
In practice, it is almost impossible for these conditions to hold exactly, but we may still have numerical issues when the geometric configuration is close to these cases. To test the robustness of the three algorithms considered, we generate simulation data corresponding to small perturbations (uniformly distributed within ) of the features’ positions when in singular configurations. The errors are defined as in Section 3.1, while we compute the medians of them to assess the robustness of the three methods. For fairness, we do not apply root polishing to our code here. According to the results shown in Fig.s 5 - 8 and Tables 2 - 3, our method achieves the best accuracy in these two close-to-singular cases. The reason is that we do not compute any quantities that may suffer from numerical issues, such as cotangent and tangent in [15] and [18], respectively.
4 Conclusion and Future Work
In this paper, we have introduced an algebraic approach for computing the solutions of the P3P problem in closed form. Similarly to [15] and [18], our algorithm does not solve for the distances first, and hence reduces numerical-error propagation. Differently though, it does not involve numerically-unstable functions (e.g., tangent, or cotangent) and has simpler expressions than the two recent alternative methods [15, 18], and thus it outperforms them in terms of speed, accuracy, and robustness to close-to-singular cases.
As part of our ongoing work, we are currently extending our approach to also address the case of the generalized (non-central camera) P3P [20].
5 Appendix
5.1 Proof of
First, note that is a unit vector since is perpendicular to . Also, from (13) and (7) we have
[TABLE]
Then, we can prove by showing that their inner product is equal to 1:
[TABLE]
5.2 Equivalence between the two solutions of
When solving for [see (9)], we have two possible solutions and . Next, we will prove that using to find is equivalent to using . First, note that (see Fig. 2)
[TABLE]
Then, we can write as
[TABLE]
If we use to find ,
[TABLE]
Note that in (60) is of the same form as that in (5), so any solutions of computed using will be found by using . Thus, we do not need to consider any other solutions for and beyond the ones found for .
5.3 Determining the sign of
From (51), we have two solutions for , and thus for , with . This will also result into two solutions for [see (52)] and, hence, two solutions for : and . Considering these two options, we get two distinct solutions for [see (53)]:
[TABLE]
Then, notice that
[TABLE]
If , this would require
[TABLE]
which cannot be true, hence and cannot be equal. Thus, there are always two different solutions of .
If, however, we use the fact that is positive, we can determine the sign of , and choose the valid one among the two solutions of . Subtracting (1) pairwise for () from (), we have
[TABLE]
Multiplying both sides of (5.3) with from the left, yields
[TABLE]
Using the fact that and , we select the sign of to be the same as that of .
5.4 Least-squares solution for the position
can also be solved following a least-squares approach, which is slower but more accurate than (56). Specifically, (1) can result in the following system:
[TABLE]
Then, we only need to apply QR decomposition [7] and backsolve for (i.e., we do not need to compute ).
5.5 Derivation of
[TABLE]
5.6 Derivation of and
First, note that
[TABLE]
Let , and thus
[TABLE]
[TABLE]
[TABLE]
Then, from (26) and (28), we derive the expressions of :
[TABLE]
Additionally, we can derive the expression of , which is defined in (55):
[TABLE]
5.7 Comparison with the P3P code in OpenCV
We also compared the performance of our code with that in OpenCV (based on [6]), using the same setup as Sec. 3. The error distributions are showed in Fig. 9 and Fig. 10. It is obvious that the code in OpenCV has lower numerical accuracy comparing to ours. Also, it takes around 3 s on average to compute P3P once, which is much slower than ours (0.52 s according to Sec. 3).
In conclusion, our code performs much better than the one in OpenCV.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] A. Ansar and K. Daniilidis. Linear pose estimation from points or lines. IEEE Transactions on Pattern Analysis and Machine Intelligence , 25(5):578–589, 2003.
- 2[2] G. Cardano, T. R. Witmer, and O. Ore. The Rules of Algebra: Ars Magna , volume 685. Courier Corporation, 2007.
- 3[3] D. A. Cox, J. Little, and D. O’Shea. Using algebraic geometry , volume 185. Springer Science & Business Media, 2006.
- 4[4] S. Finsterwalder and W. Scheufele. Das rückwärtseinschneiden im raum . Verlag d. Bayer. Akad. d. Wiss., 1903.
- 5[5] M. A. Fischler and R. C. Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM , 24(6):381–395, 1981.
- 6[6] X.-S. Gao, X.-R. Hou, J. Tang, and H.-F. Cheng. Complete solution classification for the perspective-three-point problem. IEEE Transactions on Pattern Analysis and Machine Intelligence , 25(8):930–943, 2003.
- 7[7] G. H. Golub and C. F. Van Loan. Matrix computations , volume 3. JHU Press, 2012.
- 8[8] E. W. Grafarend, P. Lohse, and B. Schaffrin. Dreidimensionaler rückwärtsschnitt teil i: Die projektiven gleichungen. Zeitschrift für Vermessungswesen , pages 1–37, 1989.
