Generic Primitive Detection in Point Clouds Using Novel Minimal Quadric Fits
Tolga Birdal, Benjamin Busam, Nassir Navab, Slobodan Ilic and, Peter Sturm

TL;DR
This paper introduces a novel method for detecting various 3D primitives in cluttered point clouds using minimal quadric fits, a new Hough voting scheme, and RANSAC, enabling accurate, efficient, and segmentation-free detection of multiple primitive types.
Contribution
The paper presents the first unified approach for generic primitive detection in point clouds using minimal quadric fits and a new voting scheme, without requiring segmentation.
Findings
Effective detection of multiple primitive types in cluttered scenes.
Reduced computational complexity from O(N^4) to O(N^3).
Demonstrated high accuracy and flexibility through extensive experiments.
Abstract
We present a novel and effective method for detecting 3D primitives in cluttered, unorganized point clouds, without axillary segmentation or type specification. We consider the quadric surfaces for encapsulating the basic building blocks of our environments - planes, spheres, ellipsoids, cones or cylinders, in a unified fashion. Moreover, quadrics allow us to model higher degree of freedom shapes, such as hyperboloids or paraboloids that could be used in non-rigid settings. We begin by contributing two novel quadric fits targeting 3D point sets that are endowed with tangent space information. Based upon the idea of aligning the quadric gradients with the surface normals, our first formulation is exact and requires as low as four oriented points. The second fit approximates the first, and reduces the computational effort. We theoretically analyze these fits with rigor, and give…
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21
Figure 22
Figure 23| Dataset | # Objects | Type | Occlusion | Accuracy | |
|---|---|---|---|---|---|
| Pilates Ball 1 | Ours | 580 | Generic | Yes | 94.40% |
| Rugby Ball | [62] | 1337 | Generic | No | 100.00% |
| Pilates Ball 2 | [62] | 1412 | Sphere | Yes | 100.00% |
| Big Globe | [62] | 2612 | Sphere | Yes | 90.70% |
| Small Globe | [62] | 379 | Sphere | Yes | 56.90% |
| Apple | [62] | 577 | Sphere | Yes | 99.60% |
| Football | [62] | 1145 | Sphere | Yes | 100.00% |
| Orange Ball | [62] | 270 | Sphere | Yes | 93.30% |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Surveying and Cultural Heritage · 3D Shape Modeling and Analysis · Image and Object Detection Techniques
Generic Primitive Detection in Point Clouds Using Novel Minimal Quadric Fits
Tolga Birdal , Benjamin Busam , Nassir Navab , Slobodan Ilic , Peter Sturm T. Birdal, B. Busam, N. Navab and S. Ilic are with the Department of Informatics, Technical University of Munich, Germany
E-mail: [email protected], [email protected] T. Birdal and S. Ilic are with Siemens AG, Munich, Germany.
E-mail: [email protected], [email protected] B. Busam is with Framos GmbH, Munich, Germany.
E-mail: [email protected] P. Sturm is with INRIA, Grenoble, France
E-mail: [email protected] received June XX, 2018.
Abstract
We present a novel and effective method for detecting 3D primitives in cluttered, unorganized point clouds, without axillary segmentation or type specification. We consider the quadric surfaces for encapsulating the basic building blocks of our environments - planes, spheres, ellipsoids, cones or cylinders, in a unified fashion. Moreover, quadrics allow us to model higher degree of freedom shapes, such as hyperboloids or paraboloids that could be used in non-rigid settings.
We begin by contributing two novel quadric fits targeting 3D point sets that are endowed with tangent space information. Based upon the idea of aligning the quadric gradients with the surface normals, our first formulation is exact and requires as low as four oriented points. The second fit approximates the first, and reduces the computational effort. We theoretically analyze these fits with rigor, and give algebraic and geometric arguments. Next, by re-parameterizing the solution, we devise a new local Hough voting scheme on the null-space coefficients that is combined with RANSAC, reducing the complexity from to (three points). To the best of our knowledge, this is the first method capable of performing a generic cross-type multi-object primitive detection in difficult scenes without segmentation. Our extensive qualitative and quantitative results show that our method is efficient and flexible, as well as being accurate.
Index Terms:
Quadrics, Surface Fitting, Implicit Surfaces, Point Clouds, 3D Surface Detection, Primitive Fitting, Minimal Problems
1 Introduction
Surface fitting and detection enjoys a rich history in computer vision and graphics communities. The problem is found particularly important because of the power of 3D surfaces to explain generic man-made structures omnipresent in every day life. Many of the constructed or manufactured objects and architecture that surrounds us are results of careful computer aided design (CAD). Some of the primary concerns of 3D computer vision, mapping and reconstruction, try to associate the visual cues acquired by various 2D / 3D sensors with those idealized CAD models, that are used in the assembly of our environments.
One family of approaches tries to find a direct rigid association between those CAD models and 3D scenes [1], trying to solve the six degree of freedom (DoF) pose estimation problem. While these approaches are quite successful as the only parameters to discover are rotations and translations, they require a huge number of CAD models to generically represent the real scenarios. To overcome this limitation, inspired by the fact that all CAD models are designed using a similar set of tools, a different line of research attempts to find common bases explaining a broad set of 3D objects, and tries to detect these bases instead of individual models. Such bases that are the common building blocks of our world, and typically termed geometric primitives. While the approaches using bases, significantly reduce the database size, usually, the bases undergo higher dimensional transformations compared to, for instance, rigid ones. Examples of the geometric primitives are splines and nurbs surfaces defined by several control points, or quadrics, the three dimensional, nine-DoF, quadratic forms.
Thanks to their power to embody the most typical geometric primitives, such as planes, spheres, cylinders, cones or ellipsoids, quadrics themselves were of huge interest since 80s [2]. Some exemplary studies involve recovering 3D quadrics from image projections [3], fitting them to 3D point sets [4], or detecting special cases of the quadratic forms [5]. A majority of those works either put emphasis on fitting to a noisy, but isolated point set [4, 6, 7], or restrict the types of shapes under consideration (thereby reduce the DoF) to devise detectors robust to clutter and occlusions [8, 9, 10].
In this work, our aim is to unite the fitting and detection worlds and present an algorithm that can simultaneously estimate all parameters of a generic nine DoF quadric, which resides in a 3D cluttered environment and is viewed potentially from a single 3D sensor, introducing occlusions and partial visibility. We craft this algorithm in three stages: (1) First, we devise a new quadric fit. Unlike its ancestors, this one uses the extra information about the tangent space to increase the number of constraints instead of regularizing the solution. This fit requires only four oriented points. We show that such construction also has a regularization effect as a by-product. (2) We then thoroughly analyze its rank properties and devise a novel null-space Hough voting mechanism to reduce the four point case to three. Three points stands out to be the minimalist case developed so far. (3) We propose a variant of RANSAC that operates on our local bases, which are randomly posited. Per each local basis, we show how to make use of the fit and the voting to hypothesize a likely quadric. Finally, we use simple clustering heuristics to group and strengthen the candidate solutions. Our algorithm works purely on 3D point cloud data and does not depend upon any acquisition modality. Moreover, it makes assumptions neither about the type of the quadric that is present in the scene nor how many are visible.
This journal paper extends our recent CVPR publication [11], by providing additionally the following:
qualitative and quantitative experiments to better grasp the behavior of the proposed fit, 2. 2.
algebraic and geometric theoretical analysis of the quadric fit 3. 3.
improved elaborate descriptions of the method as well as accompanying pseudocode.
2 Related Work
2.1 Quadrics
Quadrics appear in various domains of vision, graphics and robotics. They are found to be one of the best local surface approximators in estimating differential properties [12]. Thus, point cloud normals and curvatures are oftentimes estimated with local quadrics [13, 1]. Yan et al.propose an iterative method for mesh segmentation by fitting local quadratic surfaces [14]. Yu presented a quadric-region based method for consistent point cloud segmentation [15]. Kukelova uses quadric intersections to solve minimal problems in computer vision [16]. Uto et al. [17] as well as Pas and Platt [18] use quadrics to localize grasp poses and in grasp planning. Quadrics have also been a significant center of attention in projective geometry and reconstruction [19, 3] to estimate algebraic properties of apparent contours. Finally, You and Zhang [20] used them in feature extraction from face data.
2.2 Primitive detection
Finding primitives in point clouds has kept the vision researchers busy for a lengthy period of time. Works belonging to this category treat the primitive shapes independently [10], giving rise to specific fitting algorithms for planes, spheres, cones, cylinders, etc. Planes, as the simplest forms, are the primary targets of the Hough-family [21]. Yet, detection of more general set of primitives made RANSAC the method of choice as shown by the prosperous Globfit [22]: a relational local to global RANSAC algorithm. Schnabel et al. [23] and Tran et al. [24] also focused on reliable estimation using RANSAC. Monszpart et al. [25] proposed a greedy heuristic improving upon its randomized counterparts in plane detection. Oesau et al. [26] proposes a tandem scheme for plane detection (by region-growing) and regularization to correct the imperfections of the hypotheses. The latter two improve upon Globfit by simultaneously extractıng the planes and the primitive relations. The plane detection has recently been lifted to structural scales [27, 28]. An interesting application of primitives is given by Qui et.al. who extract pipe runs using cylinder fitting [29]. The local Hough transform of Drost and Ilic [30] showed how the detection of primitives can be made more efficient by considering the local voting spaces. Authors give sphere, cylinder and plane specific formulations targeting point clouds. Lopez et al. [31] devise a robust ellipsoid fitting based on iterative non-linear optimization. Sveier et al. [32] suggest a conformal geometric algebra to spot planes, cylinders and spheres. Andrews’ approach [5] deals with paraboloids and hyperboloids in CAD models. Even though this is slightly more generalized, paraboloids or hyperboloids are not the only geometric shapes described by quadrics.
Methods in this category are quite successful in shape detection, yet they handle the primitives separately. This prevents automatic type detection, or generalized modeling of surfaces.
2.3 Quadric fitting
Since the 1990s generic quadric fitting is cast as a constrained optimization problem, where the solution is obtained from a Generalized Eigenvalue decomposition of a scatter matrix. Pioneering work has been done by Gabriel Taubin [4] in which a Taylor approximation to the geometric distance is made. This work has then been enhanced by 3L [7], fitting a local, explicit ribbon surface composed of three-level-sets of constant Euclidean distance to the object boundary. This fit implicitly used the local surface information. Later, Tasdizen [6] improved the local surface properties by incorporating the surface normals as regularizers. This allows for a good and stable fit. Recently, Beale et al. [33] introduced the use of a Bayesian prior to regularize the fit. All of these methods use at least nine or twelve [33] points. Moreover, they only use surface normals as regularizers - not as additional constraints and are also unable to deal with outliers in data. There are a few other studies [34, 8], improving these standard methods, but they involve either non-linear optimization [35] or share the common drawback of requiring nine independent constraints and no outlier treatment.
2.4 Quadric detection
Recovering general quadratic forms from cluttered and occluded scenes is a rather unexplored area. A promising direction was to represent quadrics with spline surfaces [36], but such approaches must tackle the increased number of control points, i.e. 8 for spheres, 12 for general quadrics [37, 38]. Segmentation is one way to overcome such difficulties [39, 36]. Besl and Jain suggested a variable order segmentation-based surface fitting. They, too, use an iterative procedure where the primitive order is raised incrementally [40]. This is not very different from performing individual primitive detection. Vaskevicius et.al. [41] developed a noise-model aware quadric fitting and region-growing algorithm for segmented noisy scenes. However, segmentation, due to its nature, decouples the detection problem in two parts and introduces undesired errors especially under occlusions. Other works exploit genetic algorithms [42] but have the obvious drawback of inefficiency. QDEGSAC [43] proposed a six-point hierarchical RANSAC, but the paper misses out an evaluation or method description for a quadric fit. Petitjean [12] stressed the necessity of outlier aware quadric fitting however only ends up suggesting M-estimators for future research.
Finally, the remarkable performance of deep neural networks (DNN) for learning in 2D image domain [44, 45] have recently been extended to 3D point clouds [46, 47]. While 3D-PRNN [48] and PCPNet [49] are tailored for fitting 3D shape primitives and extracting differential surface properties respectively, their application to our problem of detecting nine-DoF primitives in real scenes containing noise, clutter and occlusions is not immediate. To the best of our knowledge, this remains to be open challenge. We would also like to stress that the community lacks comprehensive primitive detection datasets and this gives the learning algorithms a hard time to grasp all the shape variations of quadrics.
3 Preliminaries
Definition 1**.**
A quadric in 3D Euclidean space is a hypersurface defined by the zero set of a polynomial of degree two:
[TABLE]
Alternatively, the vector notation is used, where:
[TABLE]
Using homogeneous coordinates, quadrics can be analyzed uniformly. The point lies on the quadric, if the projective algebraic equation over with holds true, where the matrix is defined by re-arranging the coefficients:
[TABLE]
can be viewed as an algebraic distance function. Similar to the quadric equation, the gradient at a given point can be written as . Quadrics are general implicit surfaces capable of representing cylinders, ellipsoids, cones, planes, hyperboloids, paraboloids and potentially the shapes interpolating any two of those. All together there are 17 sub-types [50]. Once is given, this type can be determined from an eigenvalue analysis of and its subspaces [51, 52]. Note that quadrics have constant second order derivatives and are practically smooth.
Definition 2**.**
A quadric whose matrix is of rank 2 consists of the union of two planes: , where and are the homogeneous 4-vectors representing the two planes. A quadric whose matrix is of rank one consists of a single plane: .
Definition 3**.**
The polar plane of a point with respect to a quadric is . Reciprocally, is called the pole of plane .
Note that if , then the polar plane does not exist for ; also note that for a point that lies on the quadric, the polar plane is the tangent plane in that point.
Definition 4**.**
A quadric is called central if it possesses a finite center point that is the pole of the plane at infinity: e.g. ellipsoids, hyperboloids.
Definition 5**.**
A dual quadric is the locus of all planes satisfying .
Quadric dual space is formed by the Legendre transformation, mapping points to tangent planes as covectors. Given a hypersurface, the tangent space at each point gives a family of hyperplanes, and thus defines a dual hypersurface in the dual space. Every dual point represents a plane in the primal. Many operations such as fitting can be performed in either of the spaces [3]; or in the primal space using constraints of the dual. The latter forms a mixed approach, involving tangency constraints. Note that, knowing a point lies on the surface gives one constraint, and if, in addition, one knows the tangent plane at that point, then one gets two more constraints. In this paper, we will use the extra dual constraints to increase the rank of a linear system that solves for the quadric coefficients. This will in return allow us to perform fits with reduced number of points and thereby to lower the minimum number of required points. In Fig. 1, we provide combinations of primal and dual constraints each of which leads to a minimal case. Note that, if we have four points, and associated tangent planes, a fit can be formulated.
4 Quadric Fitting to 3D Data
4.1 A new perspective to quadric fitting
State of the art direct solvers for quadric fitting rely either solely on point sets [4], or use surface normals as regularizers [6]. Both approaches require at least nine points, posing a strict requirement for practical considerations, i.e. using nine points bounds the possibility for RANSAC-like fitting algorithms as the space of potential samples is where is the number of points. Here, we observe that typical real life point clouds make it easy to compute the surface normals (tangent space) and thus provide an additional cue. With this orientation information, we will now explain a closed form fitting requiring only four oriented points.
Similar to gradient-one fitting [53, 6], our idea is to align the gradient vector of the quadric with the normal of the point cloud . However, unlike [53], we opt to use a linear constraint to increase the rank rather than regularizing the solution. This is seemingly non-trivial as the vector-vector alignment brings a non-linear constraint either of the form:
[TABLE]
The non-linearity is caused by the normalization as it is hard to know the magnitude and thus the homogeneous scale in advance. We solve this issue by introducing a per normal homogeneous scale among the unknowns and write:
[TABLE]
Stacking this up for all points and normals leads to:
[TABLE]
where , is a column vector of zeros, is and are the unknown homogeneous scales. The solution containing quadric coefficients and individual scale factors lies in the null-space of , and can be obtained accurately via Singular Value Decomposition. Alg. 1 provides a MATLAB implementation of such fit. For a non-degenerate quadric, the following rank (rk) relations hold:
[TABLE]
We will now further investigate on this interesting behavior.
4.2 Existence of a trivial solution for three-point case
The problem of estimating a quadric from three points and associated normals seems initially to be well-posed: when counting constraints and degrees of freedom, one obtains nine on each side (each point gives one constraint, each normal two, whereas a quadric has nine degrees of freedom). Yet, it turns out that our linear equation system always has a trivial solution besides the true one. This is summarized in Eq. 8 by providing the ranks for different cardinalities of bases. We now give further intuition and proof for this behavior:
Theorem 1**.**
Three-oriented point quadric fitting, as formulated, possesses a trivial solution besides the true solution, namely the plane spanned by the three data points. The fitting problem thus has at least a one-dimensional linear family of solutions, spanned by the true quadric and this trivial solution.
Proof.
In the following, let us call data-plane, the plane spanned by the three data points (coordinates only, i.e. not considering the associated normals). We illustrate this in Fig. 2 (left). As mentioned in 4.1 above, any rank-1 quadric consists of a single plane and can be written as . Hence, for any point on the plane and thus on the quadric, we have . In our formulation of the fitting problem, this amounts to . refers to all the gradient-normal correspondence equations, stacked together (lower part of Eq. 6). We also have due to the point lying on the quadric. This means that the following vector is a solution of the equation system: coefficients to are those of the rank-1 quadric and the three scalars are zero. In other words, the trivial solution is identified as the rank-1 quadric consisting of the data-plane. Hence, the estimation problem admits at least a one-dimensional linear family of solutions, spanned by the true quadric and the rank-1 quadric of the data-plane. In some cases, the dimension of the family of solutions may be higher (such as when the true quadric is a plane). ∎
4.2.1 A geometric explanation of the fact that the three-oriented-point problem is always under-constrained
Despite the analytical proof, it is puzzling that nine constraints on nine unknowns are never sufficient in our problem. Moreover, we may wonder if the existence of a trivial solution is due to our linear problem formulation or if it is generic. It turns out that this is generic and can be explained geometrically. To make it easier to imagine, our description will closely follow the Figures 2 and 2. Let us decompose the estimation of the quadric in two parts, the first part being the determination of the quadric’s intersection with the data-plane. The intersection of any quadric with any plane is in general always a conic (shown as black curve), be it real or imaginary (the only exception is when the quadric itself contains the data-plane entirely, in which case the “intersection” is the entire plane).
Let us examine which constraints we have at our disposal to estimate the intersection conic. First, the three data points lie on the conic. Second, we know the tangent planes at the data points, to the true quadric. Let us intersect the three tangent planes with the data plane – the resulting three lines (shown in purple) must be tangent to our conic (the only exception occurs when one or more of these tangent planes are identical to the data plane).
Hence, we know three points on the conic and three tangent lines – the problem of estimating the conic is thus in general overdetermined by one DoF. In other words, six of the nine constraints at our disposal for estimating the quadric are dedicated to estimating the five degrees of freedom of its intersection with the data plane. Hence, the remaining three constraints are not sufficient to complete the estimation of the quadric.
What are these three remaining constraints? They refer to the orientation of the tangent planes: each of the tangent planes is defined by an angle expressing the rotation about its intersection line with the data plane. This angle gives one piece of information on the quadric; for three oriented points we thus have our three remaining constraints.
Note that the three tangent planes to the quadric intersect in the quadric’s pole to the data plane (see Fig. 2). Hence, we can determine this pole which, as shown in appendix, lies on the line joining the centers of the possible solutions for the quadric.
Let us also note that the fact that six pieces of information (three data points and three tangent lines to a conic in the data plane) only constrain five degrees of freedom means that these six pieces of information are not independent from one another: in the absence of noise or other errors, they must satisfy a consistency constraint (the fact that they define a conic). In the presence of noise, the input information will not satisfy this constraint, meaning that a perfect fit will not exist. This is different in most so-called minimal estimation problems in geometric computer vision (such as three-point pose estimation - P3P), where the computed solution is perfectly consistent with the input data. In our case, we can expect that the computed quadric will not satisfy all constraints exactly, i.e. will not necessarily be incident with all data points or be exactly tangent to the given tangent planes.
This gives room to different formulations for the problem, depending on how one quantifies the quality of fit. For instance, one possibility would be to impose that the quadric goes exactly through the data points, but that the tangency is only approximately fulfilled by computing the intersection conic in the data plane and minimizing some cost functions over the tangent lines.
4.3 Regularizing with gradient norm
Quadric fitting problem, like many others (e.g. calibration, projective reconstruction) is intrinsically of non-linear nature, meaning that a “true” Maximum Likelihood Estimation or Maximum A Posteriori solution, minimizing a geometric distance, cannot be achieved by a linear fit. However, our main objective in this stage is a sufficiently close and computationally efficient fit, using as few points as possible and upon which we can build our voting scheme. Despite its sparsity, for such purpose, formulation in § 4.1 still remains suboptimal since the unknowns in Eq. 6 scale linearly with , leaving a large system to solve. In practice, analogous to gradient-one fitting [53], we could prefer unit-norm polynomial gradients, and thus, can write or equivalently , one common factor. This soft constraint will try to force zero set of the polynomial respect the local continuity of the data. Similar direction is also taken by [54], for the case of spheres. However, there, authors follow a two-step fitting process, solving first the gradient and then the positional constraint, whereas we formulate a single system solving for all the shape parameters simultaneously. Such regularization also saves us from solving the sensitive homogeneous system [20], and lets us re-write the system in a more compact form :
[TABLE]
[TABLE]
[TABLE]
, only , is similar to the in § 4.1 and gets full rank for four or more oriented points. In fact, it is not hard to show that the equations in rows are linearly dependent, which is why we get diminishing returns when we add further constraints. Note that by removing the scale factors from the solution, we also solve the sign ambiguity problem, i.e. the solution to Eq. 6 can result in negated gradient vectors. To balance the contribution of normal induced constraints we introduce a scalar weight , leading to the ten-liner MATLAB implementation as provided in Alg. 2.
In certain cases, to obtain a type-specific fit, a minor redesign of tailored to the desired primitive suffices (see §. 6.3.4). If outliers corrupt the point set, a four-point RANSAC could be used. However, below, we present a more efficient way to calculate a solution to Eq. 9 rather than using a naive RANSAC on four-tuples by analyzing its solution space. The next section can also be used as a generic method to solve any fitting problem formulated as a linear system, more efficiently.
5 Quadric Detection in Point Clouds
We now factor in clutter and occlusions into our formulation and explain a new pipeline to detect quadrics in 3D data.
Definition 6**.**
A basis is a subset composed of a fixed number of scene points () and hypothesized to lie on the sought surface.
Our algorithm operates by iteratively selecting bases from an input scene. Once a basis is fixed, an under-determined quadric fit parameterizes the solution and attached to this basis, a local accumulator space is formed. All other points in the scene are then paired with this basis to vote for the potential primitive. To discover the optimal basis, we perform RANSAC, iteratively hypothesizing different basis candidates and voting locally for probable shapes. Subsequent to such joint RANSAC and voting, we verify resulting hypotheses with efficient two-stage clustering and score functions such that multiple quadrics can be detected without repeated executions of the algorithm. We will now describe, in detail, the voting and the bases selection, respectively.
5.1 Parameterizing the solution space
Linear system in Eq. 6 describes an outlier-free closed form fit. To treat the clutter in the scene, a direct RANSAC on nine-DoF quadric appears to be trivial. Yet, it has two drawbacks:
- evaluating the error function many times is challenging, as it involves a scene-to-quadric overlap calculation in a geometric meaningful way.
- even with the proposed fitting, selecting random four-tuples from the scene might be slow in practice.
An alternative to RANSAC is Hough voting. However, has nine DoFs and is not discretization friendly. The complexity and size of this parameter space makes it hard to construct a voting space. Instead, we will now devise a local search. For this, let be a solution to the linear system in (9) and be a particular solution. can be expressed by a linear combination of homogeneous solutions as:
[TABLE]
The dimensionality of the null space depends on the rank of , which is directly influenced by the number of points used: . The exact solution could always be computed by including more points from the scene and validating them, i.e. by a local search. For that reason, the fitting can be split into distinct parts: first a parametric solution is computed, such as in Eq. 10, using a subset of points which lie on a quadric. We refer to subset as the basis. Next, the coefficients , and thus the solution, can be obtained by searching for other point(s) which lie on the same surface as .
Proposition 1**.**
If two point sets and lie on the same quadric with parameters , then the coefficients of the solution space (10) are given by the solution of the system:
[TABLE]
where , are the linear constraints of the latter set in form of (9), is a particular solution and is a stacked null-space basis as in (10), obtained from .
Proof.
Let be a quadric solution for the point set and let represent the quadric constraints for the points in form of (6) with the same parameters . As by definition lies on the same quadric , it also satisfies . Inserting Eq. 10 into this, we get:
[TABLE]
Solving Eq. 13 for requires a multiplication of a matrix with a one and ultimately solving a system of equations in unknowns. Once and are precomputed, it is much more efficient to evaluate Eq. 11 for rather than re-solving the system (9). This resembles updating the solution online for a stream of points. For our case, the amount of streamed points will depend on the size of the basis, as explained below.
5.2 Local voting for quadric detection
Given a fixed basis composed of points as in Fig. 3, a parametric solution can be described. The actual solution can then be found quickly by using Prop. 1 by incorporating new points lying on the same quadric as the basis. Thus, the problem of quadric detection is de-coupled into
- finding a proper basis and
- searching for compatible scene points. In this section, we assume the basis is correctly found and explain the search by voting. For a fixed basis on a quadric, we form the null-space decomposition of the under-determined system . We then sample further points from the scene and compute the required coefficients . Thanks to Prop. 1, this can be done efficiently. Sample points lying on the same quadric as the basis (inliers) generate the same whereas outliers will produce different values. Therefore we propose to construct a voting space on attached to basis and cast votes to maximize the consensus, only up to the locality of the basis. Fig. 3 illustrates this configuration. The size of the voting space is a design choice and depends on the size of the basis vs. the DoFs desired to be recovered (see Fig. 1).
While many choices for the basis cardinality are possible (and the formulation in § 5.1 allows for all), we find from Fig. 1 that using a three-point basis is advantageous for a generic quadric fit - having three dual points, reduces the minimum number of required primal (incidence) constraints to only four. And by the rank analysis given in Eq. 8, we see that it is possible to trade one point off to 1D local search as opposed to two-point vs 3D search for the five-point case.
5.3 Efficient computation of voting parameters for a 1D voting space
Adding a fourth sample point completes and a unique solution can be computed, as described above. Yet, as we will select multiple candidates per basis, hypothesized in a RANSAC loop, an efficient scheme is required, i.e. once again, it is undesirable to re-solve the system in Eq. 9 for each incoming tied to the basis. It turns out that the solution can be obtained directly from Eq. 10:
Proposition 2**.**
If the null-space is one dimensional (with only 1 unknown) it holds and the computation in Prop. 1 reduces to the explicit form:
[TABLE]
Proof.
Let us re-write Eq. 13 in terms of the null space vectors: . A solution can be obtained via Moore-Penrose pseudoinverse as . Because for one-dimensional null spaces, is a vector (), for which the + operator is defined as: . Substituting this in Eq. 11 gives Eq. 14. ∎
Prop. 2 enables a very quick computation of the parameter hypothesis in the case of an additional single oriented point. A MATLAB implementation takes ca. per . Note that for the minimal system we propose, four incidence (primal) and three tangent plane alignment (dual) constraints are sufficient. This means that the normal of the fourth sample point does not contribute to the set of constraints for a minimal fit. Hence, we use this piece of information for the verification of the fit. We only accept to vote a candidate quadric if the gradient of the fitted surface agrees with the surface normal of the fourth point:
[TABLE]
We typically set in order to tolerate certain noise.
5.4 Quantizing for voting
Unfortunately, is not quantization-friendly, as it is unbounded and has a non-linear effect on the quadric shape (Fig. 4). Thus, we seek to find a geometrically meaningful transformation to a bounded and well behaving space so that quantization would lead to little bias and artifacts. From a geometric perspective, each column of in Eq. 10 is multiplied by the same coefficient , corresponding to the slope of a high dimensional line in the solution space. Thus, it could as well be viewed as a rotation. For 1D null-space, we set:
[TABLE]
where and is obtained by moving in the direction from the particular solution by an offset .111Simple could work but would be more limited in the range. This new angle is bounded and thus easy to vote for. As the null-space dimension grows, starts to represent hyperplanes, still preserving the geometric meaning, i.e. for , different can be found.
Even though behaves better than for voting, we still can not guarantee a unimodal distribution such that a single peak can be identified unambiguously. Nevertheless, thanks to the local voting, the case that one distribution is noisy or misty will be handled when other random bases are selected. It is more likely that the peaks coming from different bases are concentrated around the same mode, rather than a single peak of one accumulator. Besides, we have empirically observed that in many real cases, even when the distribution is amodal, a single peak is prominent when the sampled fourth is in a reasonable vicinity of the basis.
5.5 Hypotheses aggregation
Up until now, we have described how to find plausible quadrics given local triplet bases. As mentioned, to discover the basis lying on the surface, we employ RANSAC [56], where each triplet might generate a hypothesis to be verified. Many of those will be similar as well as dissimilar. Thus, the final stage of the algorithm aggregates the potential detections to reduce the number of candidate surfaces and to increase the per quadric confidence. Not to sacrifice further speed, we run an agglomerative clustering similar to [1] in a coarse to fine manner: First a fine (close) but fast distance measure helps to cluster the obvious hypotheses. Second, a coarse (far) one is executed on these cluster centers.
Definition 7**.**
Our distance computation is two-fold: Whenever two quadrics are close, we approximate their distances as in Eq. 17 (), where is the identity matrix and the indicator function. We use the pseudoinverse just to handle singular configurations. If the shapes are far, such manifold-distance becomes erroneous and we use a globally consistent metric. To do so, we define a more geometric-meaningful distance using the points on the scene itself ():
[TABLE]
* denote the scene samples.*
Note that, algebraic but efficient lacks geometric meaning, while slower can, to a certain extent, explain the geometry. Finally, the quadrics are sorted w.r.t. their scores, evaluated pseudo-geometrically by point and normal-gradient compatibility according to Def. 8:
Definition 8**.**
The score of a quadric is defined to be:
[TABLE]
While other distance metrics, such as spectral decompositions are possible, we found these to be sufficient in our experiments. The final algorithm is summarized in Alg. 3.
6 Experimental Evaluation and Discussions
6.1 Implementation details
Prior to operation, we normalize the point coordinates to lie in a unit ball to increase the numerical stability [57]. Next, we downsample the scene using a spatial voxel-grid enforcing a minimum distance of between the samples () [58]. The required surface normals are computed by the local plane fitting [59]. As planes are singular quadrics and occupy large spaces of 3D scenes, we remove them. To do so, we convert our algorithm to a type specific plane detector, which happens to be a similar algorithm to [30]. Next, influenced by the smoothness of quadrics, we use Difference of Normals (DoN) [60] to prune the points not located on smooth regions. What follows is an iterative selection of triplets to conduct the three-point RANSAC: We first randomly draw the initial point of the basis . Once is fixed, we query the points in a large enough vicinity, whose normals differ enough to form the three-point basis . The rest of the points are then randomly selected respecting these criteria. To avoid degenerate configurations, we skip the basis if it does not result in a rank-9 matrix . In addition, to reducing the bias towards repeating bases, we hash the seen triplets and avoid duplicates.
6.2 Synthetic tests of fitting and ablation studies
To asses the accuracy of the proposed fitting, we generate a synthetic test set of multitudes of random quadrics and compare our method with the fitting procedures of Taubin [4], Tasdizen [6], Andrews [5], and Beale [33]. We propose two variants: Ours full will refer to Alg. 1, whereas Ours is the regularized one (Alg. 2).
6.2.1 Quantitative assessments
Prior to run, we add Gaussian noise to the ground-truth vertices with relative to the size of the quadric. At each noise level, ten random quadrics are tested. We perform not single but twenty fits per set. For the constrained fitting method [5] we pre-specified the type, which might not be possible in a real application. We then record and report the average point-to-mesh distance and the angle deviation as well as the runtime performances in Fig. 5. Although, our fit is designed to use a minimal number of points, it also proves robust when more points are added and is among the top fitters for the distance and angle errors. In addition, Fig. 5c shows that the errors on the gradient magnitudes obtained by our quadrics. We achieve the least errors, showing that gradient norms align well with the ground truth, favoring the validity of our approximation/regularization. Next, looking at the noise assessments, we see that our full method performs the best on low noise levels but quickly destabilizes. This is because the system might be biased to compute correct norms rather than the solution and it has increased parameters. We believe the reason for our compact fit to work well is the soft constraint where the common scale factor acts as a weighted regularizer towards special quadrics. When this constraint cannot be satisfied, the solution settles for a very acceptable shape.
In a further test, we include the six neighboring points of each of seven query points to perform a standard Taubin-fit. We call this Taubin-42. Fig. 6a shows that while the error of our method is on par with Taubin-42, we are more robust at higher noise values and more efficient with a runtime advantage of ca. .
Since for for a visually appealing fit, the normal alignment is crucial, we next present a qualitative evaluation.
6.2.2 Qualitative assessments
We synthesized a random saddle quadric and performed a random point sampling over its surface. Next, we added Gaussian noise on the sample points and computed the normals. To resolve the sign ambiguity, each normal is flipped in the direction of ground truth gradient. We plot the results of the fitting in Fig. 7. Even in presence of little noise only some methods fail to estimate the correct geometry, mostly due to the bias towards certain shape [5, 33]. Our approach is able to recover the correct surface even in presence of a severe noise. Also the effect of our regularization is visible on the last column, which possesses the best visual quality.
It is of interest to see whether our regularized fit can estimate correct surface normals as well as direction. Thus, a second test was performed to qualitatively observe the gradient properties in more detail. For this, a series of randomly generated quadrics is fitted by Taubin’s and our method and the gradients are analyzed both in terms of magnitude and phase, as shown in Fig. 9.
Due to our explicit treatment of the gradients, it can be clearly seen that the gradient direction is recovered better. Moreover, the right side of Fig. 9 also shows that our approximate approach yields the expected results, while the full method could sometimes generate inconsistent gradient signs, as the scale factors are estimated individually. Finally, it is qualitatively visible in Fig. 9 that the magnitudes recovered by our method are compatible to the ground truth. Such improvement without sacrificing gradient quality validates the regularizing nature of our approach.
6.2.3 Is a valid transformation for
To assess the practical validity of the quantization, we collect a set of 2.5 million oriented point triplets from several scenes and use them as bases to form the underdetermined system . We then sample the fourth point from those scenes, compute and establish the probability distribution for the whole collection to calculate the quantiles, mapping to bins via the inverse CDF. A similar procedure has been applied to cross ratios in [61]. We plot the findings together with the function in Fig. 6b and show that the empirical distribution and follow similar trends, justifying that our quantizer is well suited to the data.
6.2.4 Effect of weighting on the fit
We now investigate the effect of weighting parameter on the fit. For a selection of eight noisy points, located on three different synthetic quadrics, we vary and plot, in Fig. 8a, the geometric errors attained by Alg. 2, against the ground truth and Taubin fit. While too low of hurts our fit, there is a large range of values , where we can outperform [4].
6.2.5 How do voting spaces look like?
To provide insights on the local voting spaces of the angles , we sample different random bases on four synthetic quadrics as embedded in Fig. 8b, and collect the votes along with the quantized bins. These accumulators are shown in the same figure, each with a different color. It is observed that, the voting spaces are myst-free and a only single mode emerges, thanks to the maximum distance threshold selected between the basis and the paired point. It is still possible to obtain multiple modes if the threshold is unrealistically picked. The consensus votes correspond to the true shape, and erroneous votes spread randomly.
6.3 Real experiments on quadric detection
Besides synthetic tests where self evaluation was possible, we assess the quality of generic primitive detection, on 3 real datasets:
Our Dataset First, because there are no broadly accepted datasets on quadric detection, we opt to collect our own. To do so, we use an accurate phase-shift stereo structured light scanner and capture 35 3D scenes of 5 different objects within clutter and occlusions. Our objects are three bending papers, helmet, paper towel and cylindrical spray bottle. Other objects are included to create clutter. To obtain the ground truth, for each scene, we generated a visually acceptable set of quadrics using 1) [23] when shapes represent known primitives 2) by segmenting the cloud manually and performing a fit, when the quadric type is not available. Each scene then contains 1-3 ground truth quadrics. This dataset has low noise, but a high amount of clutter and partial visibility due to the FOV limitations of the sensor. 2. 2.
Large Objects Kinect sensor is widely accepted in computer vision community. Thus, it is desirable to see the performance of our generic and type-specific fit approaches on the Kinect depth images. To this end, we adapt the large objects RGB-D scan dataset of [62]. From this dataset, we sample only the scenes containing objects, that could roughly be explained by geometric primitives. These scenes include apples, globes, footballs, or other small balls. Tab. I summarizes the objects used. Example detections are also shown in Fig. 12. We also augment this dataset with a Pilates Ball sequence that we collect. This sequence involves a lot of partial visibility, clutter and fast motions (see appendix). 3. 3.
Cylinders Finally, we use a subset of the ITODD dataset [63], designed to evaluate object pose estimators. Our subset, Cylidners, includes 14 scenes of varying number of cylinders, from one to ten, as shown in Fig. 13a. Again, we use RGB images only to ease the visual perception.
6.3.1 Evaluations on detection accuracy
To assess the detection accuracy, we manually count the number of detected quadrics aligning with the ground truth in Our Dataset. We compared the four-point and three-point algorithms, both of which we propose. We also tried the naive nine-point RANSAC algorithm (with [4]), but found it to be infeasible when the initial hypotheses of the inlier set is not available. Fig. 10 visualizes the detected quadrics both on our dataset and on the 3D data captured by Kinect . Fig. 11(a) presents our accuracy over different sampling rates and the runtime performance. Our three-point method is on par with the four-point variant in terms of detection accuracy, while being significantly faster. Next, we also evaluate our detector on the large objects dataset of [62] without further tuning. Tab. I shows accuracy in locating a frontally appearing ellipsoidal rugby ball over a frame sequence without type prior. While such scenes are not particularly difficult, it is noteworthy that we manage to generate the similar quadric repeatedly at each frame within of the quadric diameter.
6.3.2 How fast is it?
As our speed is influenced by the factors of closed form fitting, RANSAC and local voting, we evaluate the fit and detection separately. Fig. 5d shows the runtime of fitting part. Our method scales linearly due to the solution of an system, but it is the fastest approach when points are used. Thus, it is more preferred for a minimal fit. Fig. 11(a) then presents the order of magnitude speed gain, when our four-point C++ version is replaced by three-points without accuracy loss. Although the final runtime is in the range of 1-2 seconds, our three-point algorithm is still the fastest known method in segmentation free detection.
6.3.3 How accurate is the fit?
To evaluate the pose accuracy on real objects, we use closed geometric objects of known size from the aforementioned datasets and report the distribution of the errors, and their statistics. We choose football and pilates ball 1 as it is easy to know their geometric properties (center and radius). We compare the radius to the true value while the center is compared to the one estimated from a non-linear refinement of the sphere. Our results are depicted in Fig. 11(b). Note that the errors successfully remain about the used sampling rates (), which is as best as we could get.
6.3.4 Type-specific detection
It is remarkably easy to convert our algorithm to a type specific one by re-designing matrix . Here, we propose a sphere-specific detector. Let us write any sphere in the following matrix form:
[TABLE]
where and are the geometric parameters (center and radius) of the sphere. Rotation does not affect spheres and our formulation in § 4.3 then simplifies to:
[TABLE]
Due to the geometric interpretability, at the scoring phases, we can use the point-to-sphere distance as:
[TABLE]
where is the point to compute the distance. A sphere to sphere distance (used in clustering) can be obtained by:
[TABLE]
Center and radius of the sphere can always be obtained from the quadric form as described in eq. 18.
Note that if one point is available, leaving only one free parameter which forms a single dimensional null-space. Geometrically, this means that the radius cannot be resolved from a single point. Yet, by fixing another point, one can vote locally as explained in § 5.2. While at this stage Drost and Ilic [30] prefer to vote for radius explicitly, we vote for the null-space coefficient. The difference is that [30] involves trigonometric computations before the voting stage, but vote linearly for the geometric parameter, whereas we keep linearity until the voting stage but vote for the non-linear angle corresponding to . Our approach evaluates far less trig functions (only one atan2).
We plug this specific fit into our detector without changing other parts and evaluate it on scenes from [62] which contains spherical everyday objects. Tab. I summarizes the dataset and reports our accuracy while Fig. 12 qualitatively shows that our sphere-specific detector can indeed operate in challenging real scenarios. Our algorithm is able to detect a sphere on many difficult cases, as long as the sphere is partially visible. We also do not have to specify the radius as unlike many Hough transform based methods. Note that, due to reduced basis size this type specific fit can meet real-time criteria.
6.3.5 Comparison to model based detectors
The literature is overwhelmed by the number of 3d model based pose estimation methods. Hence, we decide to compare our model-free approach to the model based ones. For that, we take the cylinders subset of the recent ITODD dataset [63] and run our generic quadric detector without training or specifying the type. Visuals of different methods are presented in Fig. 13 whereas detection performance are reported in Tab. II. Our task is not to explicitly estimate the pose. Thus, we manually accept a hypothesis if ICP [64] converges to a visually pleasing outcome. Note, multiple models are an important source of confusion for us, as we vote on generic quadrics. However, our algorithm outperforms certain detectors, even when we are solving a more generic problem as our shapes are allowed to deform into geometries other than cylinders.
7 Discussion and Conclusions
We presented a fast and robust pipeline for generic primitive detection in noisy and cluttered 3D scenes. Our first contribution is a novel, linear fitting formulation for oriented point quadruplets. We thoroughly analyzed this fit and devised an efficient null-space voting which uses three pieces of point primitives plus a simple local search instead of a full four oriented point fit. Together, the fitting and voting, establish the minimalist cases known up to now - three oriented points, potentially paving the way towards real-time operation. While our detector targets a generic surface, we can, optionally, convert to a type-specific fit to boost speed and accuracy.
Unless made specific, our method is surpassed by type-specific fits in detection rate since solving the generic problem is more difficult. It remains an open issue to bring the performance to the levels of type-specific fits. Nevertheless, if the design matrix targets a specific type, we perform even better. Degenerate cases are also difficult for us as shown in Fig. 14, but we always find a non-degenerate configuration good-enough to approximate the primitive.
Acknowledgments
The authors would like to thank Bertram Drost and Maximilian Baust for fruitful discussions.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] T. Birdal and S. Ilic, “Point pair features based object detection and pose estimation revisited,” in 3D Vision (3DV) . IEEE, 2015.
- 2[2] J. R. Miller, “Analysis of quadric-surface-based solid models,” IEEE Computer Graphics and Applications , vol. 8, no. 1, pp. 28–42, Jan 1988.
- 3[3] G. Cross and A. Zisserman, “Quadric reconstruction from dual-space geometry,” in International Conference on Computer Vision , 1998.
- 4[4] G. Taubin, “Estimation of planar curves, surfaces, and nonplanar space curves defined by implicit equations with applications to edge and range image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 13, no. 11, pp. 1115–1138, 1991.
- 5[5] J. Andrews and C. H. Séquin, “Type-constrained direct fitting of quadric surfaces,” Computer-Aided Design and Applications , 2014.
- 6[6] T. Tasdizen, “Robust and repeatable fitting of implicit polynomial curves to point data sets and to intensity images,” Ph.D. dissertation, Brown University, 2001.
- 7[7] M. M. Blane, Z. Lei, H. Çivi, and D. B. Cooper, “The 3l algorithm for fitting implicit polynomial curves and surfaces to data,” IEEE Transactions on Pattern Analysis and Machine Intelligence , 2000.
- 8[8] S. Allaire, J.-J. Jacq, V. Burdin, C. Roux, and C. Couture, “Type-constrained robust fitting of quadrics with application to the 3d morphological characterization of saddle-shaped articular surfaces,” in International Conference on Computer Vision . IEEE, 2007.
