Ideals of the Multiview Variety
Sameer Agarwal, Andrew Pryhuber, Rekha Thomas

TL;DR
This paper investigates the algebraic structure of the multiview variety in computer vision, establishing when certain polynomial sets generate its ideal and clarifying relationships among various proposed ideals.
Contribution
It proves that bifocal and trifocal polynomials generate the multiview ideal under distinct foci and clarifies algebraic relationships among different polynomial ideals in multiview geometry.
Findings
Bifocal and trifocal polynomials generate the multiview ideal with distinct foci.
The multiview ideal is obtained by saturating bifocal polynomials when foci are noncoplanar.
All considered ideals coincide when dehomogenized, describing the space of finite images.
Abstract
The multiview variety of an arrangement of cameras is the Zariski closure of the images of world points in the cameras. The prime vanishing ideal of this complex projective variety is called the multiview ideal. We show that the bifocal and trifocal polynomials from the cameras generate the multiview ideal when the foci are distinct. In the computer vision literature, many sets of (determinantal) polynomials have been proposed to describe the multiview variety. We establish precise algebraic relationships between the multiview ideal and these various ideals. When the camera foci are noncoplanar, we prove that the ideal of bifocal polynomials saturate to give the multiview ideal. Finally, we prove that all the ideals we consider coincide when dehomogenized, to cut out the space of finite images.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Ideals of the Multiview Variety
Sameer Agarwal and Andrew Pryhuber and Rekha R. Thomas
Abstract.
The multiview variety of an arrangement of cameras is the Zariski closure of the images of world points in the cameras. The prime vanishing ideal of this complex projective variety is called the multiview ideal. We show that the bifocal and trifocal polynomials from the cameras generate the multiview ideal when the foci are distinct. In the computer vision literature, many sets of (determinantal) polynomials have been proposed to describe the multiview variety. We establish precise algebraic relationships between the multiview ideal and these various ideals. When the camera foci are noncoplanar, we prove that the ideal of bifocal polynomials saturate to give the multiview ideal. Finally, we prove that all the ideals we consider coincide when dehomogenized, to cut out the space of finite images.
Pryhuber and Thomas were partially supported by the NSF grant DMS-1719538
1. Introduction
A general projective camera is a rank three matrix in . Given a camera arrangement , the image formation map
[TABLE]
sends a homogenized world point to its images . The th copy of in the codomain of is the homogenized image plane of camera . The unique point in the kernel of is the focal point of camera . The map is defined at all points in except at the foci . Triggs called the joint image [24] and Heyden-Åström call it the natural descriptor [12]. We are interested in studying the complete set of polynomials that vanish on .
Definition 1.1**.**
Given a set , the collection of all polynomials in that vanish on is a homogeneous ideal, known as the vanishing ideal of , and denoted as . The variety is the the smallest complex projective variety that contains , known as the Zariski closure of .
We refer the reader to [6] for the basics on ideals and varieties. In this paper we will be interested in the vanishing ideal of the joint image .
Definition 1.2**.**
The multiview ideal of , denoted , is the vanishing ideal of in where are the coordinates on the th copy of . The Zariski closure of in is the complex projective variety , which we call the multiview variety of .
The terminology multiview ideal and multiview variety comes from [2]. Following Triggs [24], Trager et al. refer to the multiview variety as the joint image variety.
Starting with the seminal work of Longuet-Higgins [16], researchers have studied various systems of polynomials that vanish on . In the computer vision literature these equations are known as multiview constraints [19, 7, 11, 17, 12]. Obviously, the ideals generated by these systems of polynomials are contained in . However, there hasn’t been much discussion of whether these polynomials generate since the focus of all these papers has been on the multiview variety and not its vanishing ideal. The aim of this paper is to provide a complete description of the multiview ideal and study its relationship to the above sets of polynomials.
It can be difficult to determine the vanishing ideal of a variety. However, there are various advantages to knowing it. To be able to do any computations with a variety or to study its structure using algebra, we need a description in terms of polynomials and the vanishing ideal is the optimal algebraic description. This manifests itself in a number of ways.
The set of all polynomial functions on is precisely , known as the coordinate ring of . In particular, a polynomial vanishes on if and only if belongs to . Knowledge of a generating set of also informs us about the local structure of , since a point is smooth if and only if the Jacobian matrix has rank equal to the codimension of . More generally, if is a projective variety then carries all the geometric information about allowing algebra (and algebraic algorithms) to infer geometric properties of . For example, the dimension and degree of can be read off from the Hilbert polynomial of which also carries many more sophisticated invariants of . See [6] for all the above.
In multiview geometry, many estimation problems can be phrased as polynomial optimization problems over varieties [13, 2]. In particular, the triangulation problem under Gaussian noise amounts to projecting a point onto the multiview variety[1].
In general, polynomial optimization on a variety boils down to certifying the non-negativity of a polynomial on by expressing it as a sum-of-squares (sos) modulo an ideal vanishing on [3]. This means finding a sos polynomial such that lies in . This expressibility is maximized, and the algorithms terminate in the lowest possible degree, when . We illustrate this on a very small example.
Example 1.3**.**
The polynomial is non-negative on . The ideal cuts out but . Now allowing as the sos certificate. On the other hand, if then has to have degree at least ; for instance .
The above phenomenon can have a major impact on the number of rounds of convex relaxations needed to solve a polynomial optimization problem such as the well-known Lasserre/sos hierarchies [14, 20], where each round looks for sos certificates of a fixed degree with degrees increasing monotonically with rounds. In each round the semidefinite program being solved is of size , where is the number of variables and is degree in that round. As a result, in many cases only the first round maybe computationally feasible and having access to can make the difference between the problem being tractable or not.
The rest of the paper is structured as follows. After a brief discussion of the notation used in this paper we begin in Section 2 by introducing a family of ideals associated with every camera arrangement which we call the -focal ideals. We describe how these ideals behave under change of coordinates, and dispel the popular myth that, under a change of image coordinates, -focal polynomials go to -focal polynomials. In Section 3, we prove our first main theorem (Theorem 3.7), that the well-known bifocal (epipolar constraints) and trifocal polynomials generate when the camera foci in are distinct. Next, in Section 4, we consider three different types of determinantal polynomials proposed to cut out the multiview variety by Heyden-Åström [12], Faugeras et al. [7] and Ma et al. [17]. We show that while the ideals they generate are all contained in , none of them actually coincide with . We establish their precise algebraic relationship with . In Section 5, we consider the relationship of the multiview ideal to bifocal polynomials and prove the algebraic analog of the statement that the bifocal polynomials cut out the multiview variety when the camera foci are noncoplanar. In Section 6, we study how the various ideals relate to each other when we restrict our attention to finite images, i.e. exclude points at infinity. We conclude in Section 7 with a summary.
Many results in this paper require explicit computation. We recommend the reader have a copy of Macaulay2 [9] (or equivalent symbolic algebra software) handy. The Macaulay2 codes for our computations can be found at https://sites.math.washington.edu/~thomas/papers/Multiview_Ideal.zip
1.1. Notation
In the rest of the paper, we will use to denote . The ideal generated by the polynomials will be denoted as .
We will use for cameras and for matrices in . and will denote arrangements of corresponding matrices. Bold, lower-case roman letters will be used to indicate vectors, and lower-case greek letters will be used for functions. Given a partial symbolic matrix , will denote the ideal generated by all minors of the matrix . The symbol denotes the set and denotes the set of all size subsets of .
2. The -focal ideals of a camera arrangement
Let be the tuple of variables denoting the coordinates associated to the projective plane corresponding to the th camera image. Write , and consider the partially symbolic matrix
[TABLE]
Let denote the evaluation of at . If then there exists some and scalars such that for all . Therefore, has a non-trivial kernel since it contains the point , and hence the maximal minors of , which are polynomials in , vanish on . Since was arbitrary, these maximal minors vanish on all of and on the multiview variety. Therefore,
[TABLE]
In this section, we describe further minors of and the ideals they generate, which will play an important role in the description of .
Definition 2.1**.**
For a subset where , consider the partially symbolic matrix
[TABLE]
of size . A maximal minor of is called a -focal polynomial* of . The -focal ideal of , , is the ideal sum*
[TABLE]
Trager et al. also study the -focal polynomials and refer to them as -linearities [21, 22]. Note that every -focal polynomial is multilinear and of total degree . Such a minor involves choosing rows of , and by a pigeonhole argument, at most four cameras may contribute more than one row to the minor when . Indeed, if more than four cameras contributed at least two rows each, then at least rows are accounted for, which leaves at most rows to take from the remaining cameras. So at least one camera will be left out entirely which means that the submatrix of that minor has a zero column and the minor is zero.
A useful fact for us will be that for two positive integers , there is a simple way to “bump up” a -focal polynomial to an -focal polynomial by multiplying the -focal polynomial with a monomial.
Lemma 2.2**.**
Suppose is a -focal polynomial from cameras where . For any cameras , there is a -focal polynomial such that for any choice of variables , one for each camera.
Proof.
Add the row and column associated to coordinate to for as follows
[TABLE]
Taking the determinant of this matrix yields the -focal polynomial . ∎
Combining the above facts we get that any -focal polynomial for is of the form where is a focal polynomial. This is a generalization of Proposition 2 in [21] that showed that every -focal polynomial is a monomial multiple of a -focal polynomial for . As a result, we will primarily focus on the ideals , , and , called the bifocal, trifocal, and quadrifocal ideals of .
A closer look at reveals that it is the ideal generated by the epipolar constraints, since is a matrix, whose determinant is the epipolar constraint between images and . By Lemma 2.2, contains the bumped up version of and for every triplet of images , the 27 trifocals implied by the three trifocal tensors relating them. And finally, contains the bumped up versions of and and the 81 quadrifocals implied by the quadrifocal tensor. The fact that we only need to study , , and lines up with the well known fact in multiview geometry that when studying -view constraints, one only needs to study the epipolar matrix, the trifocal tensor and the quadrifocal tensor. See Chapter 17 in the book by Hartley & Zisserman [11] for explicit computations of the generators of and and their history.
In the remainder of this section, we will investigate how -focal ideals transform under certain linear transformations on cameras. It is widely known that, from image data, the geometry of a camera arrangement can only be determined up to an arbitrary choice of coordinates. This is reflected in the following lemma.
Lemma 2.3** (Projective Ambiguity).**
Suppose . Then for any , where .
Proof.
This follows since for any -element subset which implies that any -focal of differs from the same -focal of by a factor of . ∎
From the proof of Lemma 2.3, we see that a coordinate change that sends maps -focals to -focals, picking up only a scalar factor . We will now see that change of coordinates on the image planes affect the -focals in a more subtle way.
Let be a sequence of invertible matrices and consider the camera arrangement obtained from a given arrangement by left-multiplying with . Note that the focal point of the camera is the same as the focal point of the camera . Since , we denote the ring by and a polynomial in it by . The sequence induces a camera-wise linear change of coordinates on by sending
[TABLE]
Note that this amounts to a change of coordinates in the image planes of the cameras in . Let denote . In what follows we will also need the notation , and .
To analyze the effect of on -focal ideals, we recall the classical Cauchy-Binet formula, a proof of which can be found in [4].
Lemma 2.4** (Cauchy-Binet).**
If and are rectangular matrices of size and , respectively, where , then the determinant of the square matrix is:
[TABLE]
where indicates that all rows/columns are taken.
Lemma 2.5**.**
For the -focal ideal , . Similarly, .
Proof.
We prove the first statement and the other follows similarly. We will show that the -focal ideal of is sent to the -focal ideal of . The result then follows for the full -focal ideal by summing the -focal ideals of all as varies over all -subsets of .
Recall that a -focal polynomial of is a maximal minor of:
[TABLE]
Applying to this maximal minor is the same as taking the same maximal minor of
[TABLE]
The corresponding -focal polynomial of is the same maximal minor of
[TABLE]
The ideal is generated by the maximal minors of , namely
[TABLE]
while is generated by the maximal minors of . We need to show that these ideals coincide.
Let denote the block diagonal matrix with blocks . A -minor of is the determinant of a submatrix with rows indexed by some . Such a submatrix has the form where is the submatrix of consisting of the rows of indexed by . By the Cauchy-Binet formula,
[TABLE]
This implies that lies in the ideal , and hence, .
The reverse containment follows by applying the same argument to and where is the block diagonal matrix with blocks .
Summing over all camera subsets, the result follows:
[TABLE]
∎
This proof shows that, contrary to popular belief, it is not true that -focal polynomials go to -focal polynomials under the change of coordinates given by , but the ideals do as in Lemma 2.5.
3. The Multiview Ideal
Recall from Definition 1.2 that the multiview ideal of the camera arrangement is the vanishing ideal of , meaning that it is the set of all polynomials in that vanish on . Since is a subset of , is, in fact, generated by polynomials with real coefficients111Let be a complex polynomial, where and are real polynomials. Then if vanish on a set of real points, then so must and ..
The complex projective variety , which is the complex Zariski closure of , is the multiview variety of . One might wonder if it is better to study the real Zariski closure of and its vanishing ideal since complex points in the multiview variety do not have any physical meaning, and hence no relevance to multiview geometry. However, observe that if the real Zariski closure was strictly smaller than the set of real points in , then there would be a polynomial not in that vanishes on , which would contradict that is the vanishing ideal of . Therefore, is also the vanishing ideal of the real Zariski closure of , and hence a real radical ideal [18, §12.5].
Further, since is a polynomial map and is irreducible, is an irreducible three-dimensional variety in . Hence is a prime (homogeneous) ideal, meaning that if then either or is in .
It was shown in [2] that the bifocals, trifocals and quadrifocals of form a universal Gröbner basis of under a certain genericity assumption on the cameras. This means that this collection of polynomials form a Gröbner basis for with respect to any term order [6]. We will use this result to establish a generating set for when the camera foci are distinct.
We first note what happens to under the change of coordinates defined in the previous section. Recall that sends a polynomial to .
Lemma 3.1**.**
The image of the multiview ideal under the map is , the multiview ideal of . i.e. , . Similarly, .
Proof.
Again, we will prove that . The proof that is similar.
From the definition we see that a polynomial vanishes on the multiview variety if and only if for all , equivalently, if and only if
[TABLE]
for all . The multiview variety of is the Zariski closure of the points as varies over . Therefore, vanishes on if and only if vanishes on . This proves that .
To finish the proof we need to argue that if then for some . A polynomial if and only if for all if and only if for all . Define . Then . ∎
We will use the results obtained so far to give an elementary proof that the bifocals and trifocals generate the multiview ideal for any arrangement of cameras with pairwise distinct foci. An important tool will be translational cameras.
Definition 3.2**.**
A camera is said to be translational if its left block is the identity matrix, i.e. , for some .
Lemma 3.3**.**
If is an arrangement of translational cameras, then .
Proof.
Using Macaulay2, this statement can be checked for translational cameras with foci represented symbolically as . For , since and , the statement follows. ∎
We now use translational cameras to show that the quadrifocals are not needed in a generating set of . This is done by extending the result for translational cameras to finite cameras. Recall that a finite camera is a camera whose left block is invertible, or equivalently a camera whose focal point is not a point at infinity. Observe that any finite camera can be obtained by multiplying some translational camera on the left by an invertible matrix.
Corollary 3.4**.**
If is any arrangement of cameras, then .
Proof.
If is an arrangement of finite cameras, then for some . Therefore where is an arrangement of translational cameras. By Lemma 3.3, . Hence, Lemma 2.5 implies
[TABLE]
For any four cameras indexed by , there exists some which takes the foci of off of the plane at infinity, i.e. , so that is an arrangement of finite cameras. Inverting this -coordinate change does not change ideal containment by Lemma 2.3. The general result follows since ∎
To get to our main result, we will need a result from [2] about camera arrangements that are generic in the sense that all minors of are non-zero. We call such an minor-generic.
Corollary 3.5**.**
Suppose is minor-generic. Then .
Proof.
Theorem 2.1 in [2] says that if is minor-generic, then the bifocals, trifocals and quadrifocals form a universal Gröbner basis of . In particular, this implies that . The statement is then immediate from Corollary 3.4. ∎
Minor-genericity is a purely algebraic condition on camera arrangements. The following statement, which appears as a brief comment in [2] without proof, gives a geometric reinterpretation of this condition.
Lemma 3.6**.**
If is minor-generic, then the foci of the cameras in are pairwise distinct. Conversely, if the cameras in have pairwise distinct foci, then there exist such that is minor-generic.
Proof.
Let denote the three-dimensional row span of . If and have the same focal point then and hence any four of the six rows of and are linearly dependent and is not minor-generic. This proves the first statement.
Now suppose the foci of cameras in are pairwise distinct. This means that the planes are pairwise distinct. For any , the rows of form a basis of . By choosing appropriately, the three rows of can be sent to any choice of three linearly independent vectors in . We need to show that there is a choice of such that no four rows from the matrices are linearly dependent.
Consider the matrix obtained by vertically stacking the cameras in , as a point in , with coordinates representing the -entry of the th camera. We will identify this point in with the corresponding matrix, and stack of cameras, and call all of them . Let denote the symbolic matrix with entries . For , let denote the determinant of the submatrix of with rows indexed by . These cut out quartic hypersurfaces in . Let denote the normal of the hyperplane . Impose linear conditions saying that the rows of , numbered , dot to zero with . These equations determine a subspace in of dimension at least . The given point lies in . We need to show that there is a choice of such that (which again lies in ) avoids the determinantal surfaces. This is equivalent to picking a basis for each that stack together to a .
We first show that is not contained in any by exhibiting a point in for each . Since at most four cameras can be involved in any , we may assume without loss of generality that involves only rows of the first four cameras. There are four cases to consider depending on how many rows these four cameras contribute to — the possibilities being , , , and . In each case we will produce a . A key observation is that and having distinct foci implies is a proper subspace of both and for all . Our starting point in each case below is which we modify to the needed by replacing the bases of that provide the rows of .
Case 1. (3,1,0,0): Modify to by choosing a basis for to be the three rows of so that no element in this basis lies in . Then does not vanish on .
Case 2. (2,2,0,0): Choose a basis for such that the two rows contributing to from the first camera are chosen from . Then is a proper subspace of of dimension at most one. Therefore taking two linearly independent vectors outside of this subspace as the two rows from creates a that does not vanish on .
Case 3. (2,1,1,0): Choose a basis for such that the two contributing rows from the first camera lie in . Choose the row from such that , which forces to be a proper subspace of . Taking outside this subspace, we get a point at which does not vanish.
Case 4. (1,1,1,1): Choose , , , and . By construction, we get a point in at which does not vanish.
Therefore, is a proper subvariety of for each , and a generic choice of will put . ∎
We note that having distinct foci does not imply that is minor-generic. A simple example would be an arrangement of four translational cameras; the submatrix consisting of the four first rows in each camera has zero determinant. However, having distinct foci allows the camera arrangement to be made minor-generic by the action of a tuple . We are now ready to prove the main theorem of this section.
Theorem 3.7**.**
Let be an arrangement of cameras with distinct foci. Then .
Proof.
By Lemma 3.6, there exists such that is minor-generic. Then, by Corollary 3.5, . Therefore, by Lemmas 3.1 and 2.5, we get
[TABLE]
∎
Proposition 5(1) in [21] says that the and together cut out the multiview variety which implies that . Theorem 3.7 shows that these polynomials also generate the multiview ideal providing the analogous ideal-theoretic statement.
Theorem 3.7 improves on Corollary 2.7 in [2] which states that when the foci of the cameras are in linearly general position, then is generated by the bifocals and trifocals. Theorem 3.7 requires no sophisticated condition on the cameras beyond the foci being pairwise distinct.
Conca et al. [5] and Li [15] also consider the vanishing ideal of the image of linear map from a projective space to a product of projective spaces. It is shown in [5] that this ideal is Cartwright-Sturmfels, meaning that its initial ideal is radical after a generic change of coordinates. Both of these works allow for projective spaces of arbitrary dimension. Specializing to our situation, Li’s results show that while we prove that .
Just like in [21] where the results automatically generalized from projective cameras to Euclidean cameras, Theorem 3.7 also generalizes to Euclidean cameras. Recall that a camera is Euclidean if it is of the form where .
Corollary 3.8**.**
Let be an arrangement of Euclidean cameras with pairwise distinct foci. Then .
We state one more consequence of Theorem 3.7 which will be needed in the next section.
Corollary 3.9**.**
Let be a camera arrangement with pairwise distinct foci. Then for any , the points lie in where is the focal point of .
Proof.
By Theorem 3.7, it suffices to show that for any , the bifocals and trifocals vanish on the points . For any pair of cameras , observe that is a nonzero element of . For any pair not containing camera , is a nonzero element of . Hence all polynomials of vanish on . A similar argument applies to any triples of cameras, from which it follows that all polynomials in vanish on . ∎
The image of focal point in image , i.e. , , is called the epipole in image relative to image . Corollary 3.9 shows that while the product of an arbitrary point in image with all epipoles relative to image does not appear in the image of , these points appear in the multiview variety after taking Zariski closure. See also Proposition 1 in [21].
We conclude this section by showing that the hypothesis in Theorem 3.7 cannot be relaxed, namely if a pair of foci of cameras in coincide, then the multiview ideal is strictly larger than the ideal generated by bifocals and trifocals.
Example 3.10**.**
Consider the four translational camera arrangement where , , . Eliminating the variables and from the ideal , we can directly obtain . Computing a primary decomposition of , we find that
[TABLE]
The extra component cuts out the points , and from the primary decomposition we see that the projective variety they form is not contained in .
4. More Ideals for the Multiview Variety
In the computer vision literature, there are several sets of polynomials that have been shown to vanish on the space of images , and hence they also vanish on the multiview variety. We now consider three such sets of polynomials and the ideals they generate, and compare them to the multiview ideal .
4.1. Heyden and Åström [12]
Heyden and Åström were the first to do an algebraic study of the multiview variety, by studying the -focal ideal [12]. The variety of this ideal is indeed the multiview variety.
Lemma 4.1**.**
For any camera arrangement with pairwise distinct foci, .
Proof.
Recall from the image formation equations, for all , that if lies in the image of then the matrix has a non-trivial kernel. This means that all maximal minors of vanish on the image of , and therefore also on its Zariski closure, which is the multiview variety. Therefore, .
To see the reverse inclusion, suppose which means that is rank deficient and there is a nonzero vector of the form in the kernel of . If , then we will get that for all . However, since , it must be that for all and hence the vector in the kernel is the zero vector which is a contradiction. Therefore, there is a nonzero vector such that for some . If is not the focal point of any camera, then lies in . Since is continuous, . It follows that because and so . On the other hand, if is the focal point of camera , then for all , and by Corollary 3.9, . Thus we get that .
∎
Example 3.10 shows that the assumption of distinct foci is necessary for Lemma 4.1. In this example, and by Corollary 3.4. We see that has a component other than .
4.2. Faugeras et al. [8].
The second set of polynomials we will study were constructed by Faugeras & Mourrain while proving that the multiview variety is cut out by epipolar/bifocal and trifocal polynomials, and that the quadrifocal constraints corresponding to the quadrifocal tensor were not needed [7, 8].
Observe that implies , for each , or equivalently, , where
[TABLE]
represents taking cross product with , i.e. , . Stacking all matrices , we get the partially symbolic matrix
[TABLE]
If there is a world point satisfying , then this matrix is rank deficient and all maximal minors of vanishes on the multiview variety.
Definition 4.2**.**
The ideal of all maximal minors of , denoted by , will be called the Faugeras ideal of the arrangement . We denote the subideals of generated by minors involving only two and three cameras by and , respectively.
We now describe a sequence of matrix transformations that allow us to obtain from . Let be the symbolic block diagonal matrix of size . Multiplying on the left by the block diagonal matrix and dropping the rightmost columns of the resulting matrix, we obtain :
[TABLE]
where as before, we abuse notation to let also represents the matrix obtained by stacking the cameras vertically. From the matrix constructions of and , we observe that their projective vanishing sets in coincide.
Lemma 4.3**.**
For any camera arrangement with pairwise distinct foci, .
Proof.
The proof will follow from Lemma 4.1 if we can show that . If is such that drops rank, then there exists a nonzero so that for all . This means there exist nonzero scale factors such that . The vector is a nontrivial element in , so is rank deficient. Therefore
For the other inclusion, if there is a nontrivial for some , then as in the proof of Lemma 4.1, must be nonzero, and so is a nontrivial element of . This shows that , hence .
∎
4.3. Ma et al. [17]
The third and final set of polynomials we will study are the so called multiview rank constraints which were proposed by Ma and collaborators [17] as an alternative to the multilinear constraints studied for example in Hartley & Zisserman [11].
Suppose and for . Starting with , a series of matrix operations are described in Chapter 8 in [17] to arrive at a new set of determinantal polynomials, arising as maximal minors of
[TABLE]
Definition 4.4**.**
The ideal of all maximal minors of , denoted by , will be called the Ma ideal of the arrangement .
We observe that can be obtained from by multiplying by a single matrix on the right:
[TABLE]
From this we observe that has the same projective vanishing set as , and hence and .
Lemma 4.5**.**
For any camera arrangement with pairwise distinct foci and , .
Proof.
If is such that drops rank, then there exists a nontrivial . Therefore, is nontrivial. Note that it is necessary that we assume so that . This shows that .
For the other inclusion, if for some , then since , there exists a scalar such that . This means that , which is nontrivial because if , then , so . This shows , and the desired result follows from Lemma 4.3.
∎
Observe that is generated by polynomials of total degree 3. This fact has an interesting consequence. As we mentioned earlier, has been proposed as an alternate algebraic foundation for multi-view geometry. From Lemma 4.5, we know that it cuts out the multiview variety. Since is the vanishing ideal of the multiview variety, we get that . However, from Theorem 3.7 we know that , i.e. it is generated by polynomials of degree two and three, which means that in general and instead or equivalently . This means that the bifocals and trifocals imply the multiview rank constraints, but not the other way around. Similarly, and , which are generated by polynomials of total degree and four respectively, are properly contained in . We see this in Example 4.6 below.
4.4. Relationships to the Multiview Ideal
We now compute the three ideals on an example, foreshadowing their structural properties, which we examine next.
Example 4.6**.**
Consider the translational arrangement where , , whose multiview ideal is:
[TABLE]
The primary decompositions of , , and are
[TABLE]
[TABLE]
where is a component minimally generated by 133 polynomials of total degree up to eight.
While each of , , and notably contains as a component, the nature of their other components is worth further investigation. ∎
To analyze the extra components, we rely on several notions from commutative algebra, which we define next. The first notion is that of a multigraded ring. Consider the ring endowed with the -grading where and is the th standard basis vector in . We say a polynomial in this ring is homogeneous if each of its terms have the same multidegree.
The irrelevant ideal in this grading, which we denote by , is the intersection of the ideals :
[TABLE]
Observe that is generated by all multilinear monomials of multidegree and total degree . It is the maximal ideal in the ring generated by homogeneous elements of strictly positive multidegree.
The radical of an ideal is the ideal . If is a homogeneous ideal then so is its radical, and . The colon of an ideal with the ideal , denoted as is the set of all polynomials such that for all , i.e. ,
Recall that the projective varieties of the ideals , , and all agree and equal the multiview variety . We can now state a first relationship among the ideals that follows easily from the projective Nullstellensatz in our multigraded setting, whose statement and proof will appear in Appendix A.
Theorem 4.7**.**
For any with pairwise distinct foci,
- a)
. 2. b)
. 3. c)
* when .*
Proof.
See Appendix A. ∎
In the language of algebraic geometry what this says is that and all cut out the multiview variety scheme-theoretically. They are not equal as ideals but they agree in high enough multidegree with , see [10, pp 50].
We now strengthen Theorem 4.7 (a) and (b) to show that the operation of taking the radical is not needed, i.e. , and . This means that and already cut out the multiview variety scheme-theoretically. Experimental evidence suggests that when , such a result is also true for , but an explicit proof is made difficult by the convoluted structure of the minors of .
We first show that the simple structure of the primary decomposition of observed in Example 4.6 holds in general.
Lemma 4.8**.**
For any camera arrangement with pairwise distinct foci, . In particular, is a radical ideal with prime decomposition .
Proof.
Suppose is a generator of , i.e. , a maximal minor of . Then . Also, since vanishes on , . Therefore, .
Now suppose . Since is generated by bifocals and trifocals where ’s are bifocals, ’s are trifocals, are monomials, and are scalars. Further, since , every term in is divisible by some generator of where . Now consider . Since involves only two cameras, it must be that contains a variable from each of the other cameras so that each term of lies in . This makes a monomial multiple of a -focal by Lemma 2.2. The same argument holds for . Thus, . ∎
Proposition b3 in [22] proves that when is minor-generic, is a radical ideal. Lemma 4.8 shows that is always a radical ideal under the weaker assumption of distinct foci.
Theorem 4.9**.**
For any camera arrangement with pairwise distinct foci, .
Proof.
We first note that . Suppose . Then for any monomial generator of . Since is prime and does not contain any monomials, . Since by Lemma 4.8, . ∎
We now consider the Faugeras ideal and prove that . The nontrivial part is to argue that is contained in . This fact relies on the following technical lemma, similar in flavor to Lemma 2.2, which shows that bifocals and trifocals can both be multiplied by any generator of to fall into .
Lemma 4.10**.**
- a)
For cameras, and any monomial , there exists a minor of such that . 2. b)
Let and be pairwise distinct. Then for any trifocal and any coordinate , there exists a minor of such that .
Proof.
See Appendix B both for the notation and the proof. ∎
Theorem 4.11**.**
For any camera arrangement with pairwise distinct foci, .
Proof.
The containment follows as in Theorem 4.9 because and hence, . The other containment will follow by showing . For general camera arrangements with cameras, recall that (resp. ) is the ideal generated by all minors of that involve only two (resp. three) cameras. By Lemma 4.10(a), for any multilinear monomial and any bifocal , for some Faugeras minor , hence . We address the trifocals in two cases. First consider the case when the two rows eliminated from to form a trifocal come from the same camera, say without loss of generality, from camera . In this case, for some , and Lemma 4.10(a) again implies . For the case when the two rows from to form come from different cameras, Lemma 4.10(b) implies that, for any , for some . We conclude that , as desired. ∎
5. The Bifocal Ideal
We saw in Theorem 3.7 that the bifocals and trifocals together generate the multiview ideal when the camera foci are pairwise distinct. In this section, we investigate how imposing further conditions on the cameras can lead to an even simpler description of the multiview ideal. Heyden and Åström [12] and Trager et al. [21] show that when the camera foci are not all on a plane, the bifocals are necessary and sufficient to cut out the multiview variety. There has also been work to further reduce this description by considering the minimal number of bifocals needed ([12], [23]), though we will not address this question here. In this section, we focus on the ideal-theoretic relationship between the bifocal ideal and the multiview ideal when the camera foci are noncoplanar.
To motivate our investigation, we start with some examples. We say that a camera arrangement is coplanar, noncoplanar or collinear if their foci have the corresponding property.
Example 5.1**.**
Consider the four noncoplanar translational camera arrangement where , , , . Eliminating the variables and from the ideal , we observe occurs as a component in
[TABLE]
Example 5.2**.**
Consider the four coplanar translational camera arrangement where , , , . We observe that where
[TABLE]
In Example 5.1, each extra component of contains an irrelevant ideal and hence does not contribute to . Saturating the bifocal ideal with respect to the full irrelevant ideal removes these components. We will prove that this is always true when camera foci are noncoplanar. We begin by proving a series of three lemmas.
Lemma 5.3**.**
Suppose is an arrangement of cameras with pairwise distinct foci. Then is noncoplanar .
Proof.
. If is noncoplanar, then there is some subset of four cameras that is noncoplanar. Order the cameras in so that these are the cameras . By a change of coordinates on , we can send the foci of the cameras to the foci of the cameras in from Example 5.1. Then, by Lemma 2.5, applying coordinate changes using some , we can assume that is an arrangement of translational cameras. These transformations fix the first four cameras, and we think of the cameras for as variable, represented symbolically by their translations, and the implication can confirmed by direct calculation in Macaulay2.
. In this case, the full computation is too expensive. To make the computation feasible, we split the proof into two cases, depending on whether the arrangement has five collinear cameras or not.
Case I: If a noncoplanar arrangement of seven cameras has at most four collinear cameras, then every four camera subset can be augmented with two additional cameras to get a noncoplanar arrangement of six cameras. Thus every 7-focal of such an arrangement, which looks like for some quadrifocal , has the form of a 6-focal from a noncoplanar arrangement, say , multiplied by a coordinate . The case shows that is generated by 2-focals, hence is generated by 2-focals.
Case II: We now consider the case of noncoplanar seven camera arrangements in which five cameras are collinear. In this case, by a proper choice of camera ordering and coordinate change, we can assume the translations of are of the form where the are symbolic. This makes collinear. The choice to take the line that the cameras lie on to be the axis is arbitrary, but can be made without loss of generality. This arrangement is now described by few enough variables to enable a direct computation showing that .
. Now suppose and is an -focal of . Recall that involves all cameras but at most four cameras can contribute two rows to the matrix whose determinant is . At one extreme, these four cameras maybe and at the other extreme they might be four cameras different from the first four, which we call . Thus the -focal is a monomial multiple of a 8-focal of where where is a quadrifocal and is a monomial.
If the four cameras contributing to involve , then is a multiple of a 7-focal from noncoplanar cameras. On the other hand, if , then can be generated by the trifocals of by Lemma 3.3:
[TABLE]
In particular, this shows that can be generated from 7-focals, . These come from noncoplanar seven camera arrangements because are noncoplanar. In either case, we know that such 7-focals can be generated by 2-focals, hence . It follows that , as desired. ∎
Lemma 5.4**.**
Suppose is an arrangement of cameras with pairwise distinct foci. Then .
Proof.
If , then , vanishes on . Since is prime and does not contain any monomials, . Therefore, . For the other containment, by Theorem 3.7, it suffices to show that and are contained in . It is clear that . By Lemma 2.2, multiplying any by a generator of yields a monomial multiple of an -focal. By assumption, this -focal lies in . Thus, and . ∎
Lemma 5.5**.**
Suppose is an arrangement of cameras with pairwise distinct foci. Then is noncoplanar.
Proof.
We prove the contrapositive, namely that if is coplanar then . We will construct a point , from which the result will follow.
Let be the normal vector of a plane containing the foci of the cameras in . If the foci are not collinear then is unique, otherwise we choose any plane containing the foci and its normal . Let denote the image of the plane in camera , and let denote the image of the focal point of camera in image . Then since the focal point of camera lies in . Choose and . Then there is a unique world point on whose images in cameras and are and . Let be the (unique) image of in camera . Then satisfy trifocal constraints. Choose and some for . By construction, . Since the cameras are coplanar, the epipolar plane given by and any two cameras and is for any pair . By choosing for all , we force every bifocal polynomial to vanish on . Therefore by construction, , but since , we conclude that . ∎
Together, Lemmas 5.3, 5.4, 5.5 imply the following theorem.
Theorem 5.6**.**
Suppose is an arrangement of cameras with pairwise distinct foci. Then the following are equivalent.
- (a)
* is noncoplanar.* 2. (b)
. 3. (c)
.
We now make some observations about Theorem 5.6.
Theorem 6.1 in [12] observes that for noncoplanar while Proposition 5 (2) in [21] further shows that is equivalent to the foci of being noncoplanar. Our Theorem 5.6 proves the analogous ideal statement, namely that noncoplanarity of foci is equivalent to .
Example 5.2 shows how Theorem 5.6 fails when is coplanar. The bifocal ideal contains the component , which cannot be removed by saturating with respect to . Its variety cuts out the projections of the plane containing the foci of in each camera image. This plane in has normal vector . The following example shows that further degeneracy occurs when camera foci are collinear.
Example 5.7**.**
Consider the four collinear translational camera arrangement where , , , . Here, , but both ideals are prime, so cannot occur as a component of . In addition, the dimension of is one larger than that of . This is explained by the fact that there is an entire one-dimensional family of planes that contains the camera centers of .
As seen in the above examples and discussion, the relation between and can be complicated when camera centers are coplanar or collinear. Determining the exact relationship between ideals in these degenerate settings would be an interesting problem for the future.
In Theorem 5.6 we showed that when cameras are noncoplanar, the -focal ideal becomes a subset of the -focal ideal. We now give an example to show that this containment need not hold for where . The construction relies on having three of five cameras being collinear.
Example 5.8**.**
Consider the five translational camera arrangement with . Theorem 5.6 shows that since is noncoplanar. However the following trifocal from ,
[TABLE]
is not in . Similarly, the quadrifocal,
[TABLE]
from cameras is not in .
6. Finite Images
The results of the previous sections have important practical consequences when we restrict attention to the set of all finite images, that is to all with for all . The vanishing ideal of this affine patch is obtained by dehomogenizing with respect to the variables from each image plane. We call this the affine multiview ideal of and denote it , where is the map setting each to 1. From Theorem 3.7, we see that is generated by dehomogenized bifocals and dehomogenized trifocals when the foci of are pairwise distinct.
Corollary 6.1**.**
If is a camera arrangement with pairwise distinct foci, then .
Using the following fact about dehomogenizing colon ideals, the results of Section 4 yield a nice relation among , and the affine multiview ideal, .
Lemma 6.2**.**
For ideals , .
Proof.
If , then for some which satisfies for all . Therefore for any , proving . If , then for any , , i.e. , there exists such that . Denote the homogenization of with respect to by . We claim that . Indeed for any , for some . Homogenizing both sides, we get , and we conclude that ∎
Corollary 6.3**.**
If is a camera arrangement with pairwise distinct foci, then .
Proof.
Lemma 6.2 implies that for any ideal . Dehomogenizing Theorems 4.9, 4.11, and 4.7, each equality follows. ∎
Observe that the last equality in Corollary 6.3 requires . Geometrically, Corollary 6.3 shows that while the homogenous ideals , and do not coincide, they are the same away from the origin in each image plane. In particular, this is the case on the affine patch corresponding to finite image data.
Using Theorem 5.6 we see that, when is noncoplanar, the dehomogenized bifocals alone suffice to generate the affine multiview ideal .
Corollary 6.4**.**
Suppose is a noncoplanar camera arrangement with pairwise distinct foci. Then
[TABLE]
Proof.
Dehomogenizing the result of Theorem 5.6, we get ∎
Corollary 6.4 shows that is generated by quadratics whenever satisfies the noncoplanarity assumption. This observation was used in [1] to create a semidefinite programming relaxation of the triangulation problem which is can be seen as minimizing Euclidean distance from an observed noisy data point to the affine multiview variety. It was shown that when the noise is small, the semidefinite relaxation solves triangulation. Of course, Corollary 6.3 needs the foci of the cameras to be noncoplanar and indeed, the experiments in [1] show that the quality of the semidefinite programming solution deteriorates as the foci become coplanar and then collinear.
Geometrically, we can understand how the quality of the relaxation deteriorates because the bifocal ideal cuts out more than the multiview variety for coplanar arrangements. In the coplanar case, the bifocal ideal cuts out the image of the plane that contains the camera centers. These points are not the images of true 3D points. It is therefore possible that the nearest point problem yields a spurious solution on this extra component. Similarly, in the collinear case, the bifocal ideal cuts out a strictly larger variety than just the multiview variety. In this case, the dimension of the vanishing set of the bifocal ideal is one larger than the multiview variety.
7. Summary
The multiview variety is a foundational geometric object in multiview geometry and understanding its vanishing ideal precisely is important for any algebraic algorithm that solves problems on this variety. There have been many partial results about the algebraic structure of the multiview variety. The aim of our paper is to put them all into a unified algebraic setting and give a complete description of .
Our main result is that when the foci of the cameras are pairwise distinct, is generated by the bifocal and trifocal polynomials of (Theorem 3.7). The proof requires an understanding of the behavior of coordinate changes on -focal ideals (Lemma 2.5), and translational cameras (Lemma 3.3). The main result holds for Euclidean cameras as well (Corollary 3.8). We also give an example to illustrate that the assumption of distinct foci cannot be relaxed for this result to hold (Example 3.10).
Next we study three sets of polynomials that have been proposed to cut out the multiview variety, by Heyden-Åström, Faugeras and Ma et. al. respectively. We show that the ideals generated by these polynomials are all properly contained in . We establish the exact algebraic relationships between the above ideals and (Theorems 4.7, 4.9 and 4.11).
We then prove that if the camera foci are assumed to be noncoplanar, then in fact is the saturation of the bifocal ideal by the irrelevant ideal (Theorem 5.6). In this situation the -focal ideal is a subset of the bifocal ideal.
Finally we prove that the dehomogenization of the ideals by Heyden-Åström, Faugeras and Ma et. al. all agree with the dehomogenization of (Corollary 6.3). Similarly, under noncoplanarity of foci, the bifocal ideal also has the same dehomogenization (Corollary 6.4). This means that all of these ideals cut out the space of finite images.
8. Acknowledgements
We wish to thank the referees of this paper for their careful reading and suggestions. In particular, their comments helped fill a gap in the proof of the main theorem of Section 5.
Andrew Pryhuber and Rekha R. Thomas acknowledge support from the U.S. National Science Foundation through the grant DMS-1719538.
Appendix A: Multigraded Projective Nullstellensatz
In this appendix, we state and prove the projective Nullstellensatz in our multigraded setting, which we use to prove Theorem 4.7 in Section 4. Let be homogeneous with respect to the -grading . To be clear about projective versus affine varieties, we define , and for a set , we define
[TABLE]
We say that is the projective vanishing set of in and is the largest homogeneous ideal vanishing on contained in . While we force , it also makes sense to consider the largest homogeneous ideal vanishing on without intersecting with . As before we denote this ideal by , and notice that . In the usual grading on , a vanishing ideal is homogeneous in the usual sense which means that it is contained in the usual irrelevant ideal . Under the multi-grading, is required to be in the corresponding irrelevant ideal . We will use the following variant of the Nullstellensatz.
Lemma 8.1**.**
For any homogeneous ideal such that ,
Proof.
Define the affine operations
[TABLE]
where we treat as a subset of . We will use the affine version of the Nullstellensatz on the cone over , i.e. , the set . We claim that
[TABLE]
First suppose . Given , all homogeneous coordinates of , represented by scalings , lie in , so vanishes for all homogeneous coordinates of . This means that the homogeneous components of , consisting of all terms with multidegree , vanish at , so , hence . By the Nullstellensatz in , , and by the assumption that , . This shows that .
Conversely, suppose . Since any point of such that for all gives homogeneous coordinates for a point in , it follows that vanishes on . We need to show that vanishes on each of the sets . Since , it has strictly positive multidegree, and every monomial in contains at least one coordinate from each copy of . Setting all 3 coordinates to zero in any forces to be zero, so we conclude that . Finally, from (11), we conclude
[TABLE]
∎
Corollary 8.2**.**
For any homogeneous ideal , .
Proof.
Observe that
[TABLE]
and
[TABLE]
Therefore by Lemma 8.1, . ∎
Corollary 8.3**.**
For any with pairwise distinct foci,
[TABLE]
Proof.
We have already shown in Section 4 that . Since is radical, the result follows by Corollary 8.2. ∎
We can now prove Theorem 4.7, restated here, from the main body of the paper.
Theorem 8.4**.**
For any with pairwise distinct foci,
- a)
** 2. b)
** 3. c)
* when *
Proof.
Taking colon ideal with , the desired result follows from Corollary 8.3 and the fact that , which was proven in Theorem 4.9.
∎
Appendix B: Technical Proofs
In this appendix, we elaborate on the technical details used to prove Theorem 4.11. Recall that the nontrivial statement there was that bifocals and trifocals can be multiplied by any generator of to fall into . This requires understanding the minors of for which we once again invoke the Cauchy-Binet formula and the observation that from (7).
First we characterize certain minors of . Let denote the th coordinate of , i.e. , , , and . Having the subscript (resp. superscript) on indicates eliminating from the unique row (resp. column) of that does not contain . On the other hand, having the subscript on the matrices and will stand for eliminating the unique row of the matrix containing .
We will only need to consider the minors of when and . Let denote collections of coordinates, and write , . When , a minor of is for some , of size , and when , . Observe that if for any , then the submatrix has at least two linearly dependent rows or columns, yielding a zero minor. When for all , is block diagonal, so .
Lemma 8.5**.**
Let . The nonzero minors of are determined by collections of coordinates with . For and , the minor is the monomial
[TABLE]
Proof.
As noted above, if for either , then , whereas if for either , then has a rank 2 block on its diagonal, hence , proving the first statement. For and , the minor is
[TABLE]
∎
Lemma 8.6**.**
Let . Suppose , and . For , the minor is the monomial
[TABLE]
where is the coordinate common to and and is the coordinate common to and .
Proof.
When as sets for or , then , hence . On the other hand, when , where . Similarly where when . ∎
We now show that bifocals and trifocals can both be multiplied by any generator of to fall into .
Lemma 8.7**.**
- a)
For cameras, and any monomial , there exists a minor of such that . 2. b)
Let and be pairwise distinct. Then for any trifocal and any coordinate , there exists a minor of such that .
Proof.
(a) Fix some . Since , is a matrix and we need to delete two rows to get a minor. Using Lemma 8.5 and Cauchy-Binet, the result follows from the computation below:
[TABLE]
where the last equality follows from expanding the determinant of along the last two columns.
(b) Without loss of generality, let , , and let be arbitrary. For simplicity, suppose . Therefore, we consider the trifocal . Using Lemma 8.6 and Cauchy-Binet, we expand where , , as follows:
[TABLE]
[TABLE]
Observe that the final equality follows from expanding the determinant of on the column.
For general , performing the same computation with , and yields . ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] C. Aholt, S. Agarwal, and R. Thomas , A QCQP approach to triangulation , in Proceedings of the European Conference on Computer Vision, 2012, pp. 654–667.
- 2[2] C. Aholt, B. Sturmfels, and R. Thomas , A Hilbert scheme in computer vision , Canadian Journal of Mathematics, 65 (2013), pp. 961–988.
- 3[3] G. Blekherman, P. A. Parrilo, and R. R. Thomas , Semidefinite Optimization and Convex Algebraic Geometry , SIAM, 2012.
- 4[4] J. G. Broida and S. G. Williamson , A Comprehensive Introduction to Linear Algebra , Addison-Wesley, 1989.
- 5[5] A. Conca, E. De Negri, and E. Gorla , Cartwright-Sturmfels ideals associated to graphs and linear spaces , ar Xiv preprint ar Xiv:1705.00575, (2017).
- 6[6] D. A. Cox, J. Little, and D. O’Shea , Ideals, Varieties, and Algorithms , Springer, 4 ed., 2015.
- 7[7] O. Faugeras, Q.-T. Luong, and T. Papadopoulou , The Geometry of Multiple Images: The Laws that Govern the Formation of Images of a Scene and Some of Their Applications , MIT Press, 2001.
- 8[8] O. Faugeras and B. Mourrain , On the geometry and algebra of the point and line correspondences between n 𝑛 n images , in Proceedings of the IEEE International Conference on Computer Vision, 1995, pp. 951–956.
