TL;DR
This paper introduces a novel surface-to-image representation for sphere-type surfaces that enables the effective application of CNNs, achieving state-of-the-art results in shape analysis tasks.
Contribution
It proposes a low distortion covering map for surface-to-image representation, facilitating deep learning on geometric data with improved accuracy.
Findings
Achieves state-of-the-art results in shape retrieval and classification.
Provides a low distortion, single-image surface representation.
Enables effective CNN application to 3D surface data.
Abstract
Developing deep learning techniques for geometric data is an active and fruitful research area. This paper tackles the problem of sphere-type surface learning by developing a novel surface-to-image representation. Using this representation we are able to quickly adapt successful CNN models to the surface setting. The surface-image representation is based on a covering map from the image domain to the surface. Namely, the map wraps around the surface several times, making sure that every part of the surface is well represented in the image. Differently from previous surface-to-image representations, we provide a low distortion coverage of all surface parts in a single image. Specifically, for the use case of learning spherical signals, our representation provides a low distortion alternative to several popular spherical parameterizations used in deep learning. We have used the…
| Method | P@N | R@N | F1@N | mAP | NDCG |
|---|---|---|---|---|---|
| FURUYA_DLAN | 0.814 | 0.683 | 0.706 | 0.656 | 0.754 |
| Tatsuma_ReVGG | 0.705 | 0.769 | 0.719 | 0.696 | 0.783 |
| SHREC16-Bai_GIFT | 0.678 | 0.667 | 0.661 | 0.607 | 0.735 |
| Deng_CM-VGG-6DB | 0.412 | 0.706 | 0.472 | 0.524 | 0.642 |
| Spherical CNN [7] | 0.701 | 0.711 | 0.699 | 0.676 | 0.756 |
| SO(3) Equivariant CNNs [12] | 0.717 | 0.737 | - | 0.685 | - |
| Ours | 0.749 () | 0.741 () | 0.734 | 0.709 | 0.794 |
| Method | Inputs | Accuracy |
|---|---|---|
| Learning Gims [40] | mesh | 83.9% |
| 3DShapeNets [49] | voxels | |
| VoxNet [29] | voxels | |
| Pointnet[36] | points | |
| Pointnet++ [37] | points | |
| Dynamic graph CNN [47] | points | |
| PCNN [2] | points | |
| Spherical CNN [7] | spherical | |
| SO(3) Equivariant CNNs [12] | spherical | |
| Spherical on unstructured grid [19] | spherical | |
| Octahedron unfolding (rot ) | spherical | |
| Equirectangular projection (rot ) | spherical | |
| Ours | spherical | |
| Ours (rot ) | spherical |
| Gluing instructions | |||
|---|---|---|---|
| 3 | 3 | ||
| 3 | 6 | ||
| 3 | 9 | ||
| 4 | 2 | ||
| 4 | 4 | ||
| 4 | 6 | ||
| 4 | 8 | ||
| 4 | 10 | ||
| 5 | 5 | ||
| 5 | 10 | ||
| 6 | 6 | ||
| 6 | 9 |
| Spatial Dimensions | Layer | kernel size | # input channels | # output channels |
|---|---|---|---|---|
| 512 x 512 | Conv2d | 5 | 3 | 128 |
| Conv2d | 3 | 128 | 128 | |
| MaxPool2d | 2 | |||
| 256 x 256 | Conv2d | 3 | 128 | 128 |
| Conv2d | 3 | 128 | 128 | |
| MaxPool2d | 2 | |||
| 128 x 128 | Conv2d | 3 | 128 | 128 |
| MaxPool2d | 2 | |||
| 64 x 64 | Conv2d | 3 | 128 | 256 |
| MaxPool2d | 2 | |||
| 32 x 32 | Conv2d | 3 | 256 | 512 |
| MaxPool2d | 2 | |||
| 16 x 16 | Conv2d | 3 | 512 | 512 |
| Conv2d | 3 | 512 | 512 | |
| UpSample | ||||
| 32 x 32 | Conv2d | 3 | 1024 | 256 |
| Conv2d | 3 | 256 | 256 | |
| UpSample | ||||
| 64 x 64 | Conv2d | 3 | 512 | 128 |
| UpSample | ||||
| 128 x 128 | Conv2d | 3 | 256 | 128 |
| UpSample | ||||
| 256 x 256 | Conv2d | 3 | 256 | 128 |
| UpSample | ||||
| Conv2d | 3 | 256 | 128 | |
| 512 x 512 | Conv2d | 1 | 128 | 8 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Surface Networks via General Covers
Niv Haim Equal contribution
Nimrod Segol ††footnotemark:
Heli Ben-Hamu
Haggai Maron
Yaron Lipman
Weizmann Institute of Science
Rehovot, Israel
Abstract
Developing deep learning techniques for geometric data is an active and fruitful research area. This paper tackles the problem of sphere-type surface learning by developing a novel surface-to-image representation. Using this representation we are able to quickly adapt successful CNN models to the surface setting.
The surface-image representation is based on a covering map from the image domain to the surface. Namely, the map wraps around the surface several times, making sure that every part of the surface is well represented in the image. Differently from previous surface-to-image representations, we provide a low distortion coverage of all surface parts in a single image. Specifically, for the use case of learning spherical signals, our representation provides a low distortion alternative to several popular spherical parameterizations used in deep learning.
We have used the surface-to-image representation to apply standard CNN architectures to 3D models including spherical signals. We show that our method achieves state of the art or comparable results on the tasks of shape retrieval, shape classification and semantic shape segmentation.
1 Introduction
Adapting deep learning methods to geometric data (e.g., shapes) is a vibrant research area that has already produced state of the art algorithms for several geometric learning tasks (e.g., [36, 37, 42]).
Two prominent approaches are: (i) mapping the geometric data to tensors (e.g., images) and using off-the-shelf convolutional neural network (CNN) architectures and optimization techniques [42, 49, 40, 27]; and (ii) developing novel architectures and optimization techniques that are tailored to the geometric data [28, 36, 37]. An important benefit of (i) is in reducing the geometric learning task to an image learning one, allowing to harness the huge algorithmic progress of neural networks for images directly to geometric data.
Some previous attempts, following (i), to perform learning tasks on geometric data use projections to 2D planes, e.g., by rendering the shapes [42]. Such projections are not injective and suffer from occlusions, thus often require a collection of projections for a single shape. Other methods embed the shape in an encapsulating 3D grid [49, 29]; these methods require dealing with higher dimensional tensors and are usually less robust to deformations. Other methods [40, 27] try to find low distortion 2D mappings to an image domain. In this case the intrinsic dimensionality of the data is preserved, however, these maps suffer from high distortion and/or ignore the difference in the topologies of the surface (no boundary) and the image (with boundary).
In this paper, we advocate a novel 2D mapping method for representing sphere-type (genus zero, e.g., the human model in Figure 1a, left) surfaces as images. The challenge in using an image to represent a surface has two aspects: geometrical and topological. Geometrically, a general curved surface cannot be mapped to a flat domain (i.e., the image) without introducing a significant distortion. Topologically, an image has a boundary while sphere-type surfaces do not; hence, any mapping between the two will introduce cuts and discontinuities. Furthermore, a naive application of 2D convolution to the image would be ambiguous on the surface (see Figure 2 and Subsection 3.1).
To address these challenges we think of the image as a periodic domain (i.e., a torus) and relax the notion of a one-to-one mapping to that of a covering map from the image domain onto the surface. That is, we construct a mapping from the image domain to the surface that covers the surface several times. For example, Figure 1a visualizes a degree- covering map. Meaning, the surface appears times in the image; note how each part of the surface appears with low distortion at-least once in the image. The image generated by our covering map is periodic, namely its left and right boundaries as well as its bottom and top boundaries correspond, making the image boundaryless. Importantly, since image convolution is well defined on a torus, it will translate to a continuous convolution-like operator on the surface [27].
Applying our method to surface learning is easy: use a covering map to transfer functions of interest over the input surfaces (e.g., the coordinate functions) to images and apply one’s favorite CNN with periodic padding.
We tested our method in two scenarios: spherical signal learning [9, 7], and surface collection learning. For spherical signal learning, our approach provided state of the art results among all spherical methods on a shape retrieval dataset (SHREC17 [39]) and a shape classification dataset (ModelNet40 [49]). For surface collection learning, our method produced state of the art results on a surface segmentation dataset (Humans [27]). Our contributions are:
- •
We introduce a broad family of low distortion surface-to-toric image representations. The toric image representation allows applying off-the-shelf CNNs to general genus-zero surfaces.
- •
In particular, we provide a framework for learning spherical signals using CNNs.
- •
We introduce a practical algorithm for computing toric covers of genus zero surfaces.
Our code is available at https://github.com/nivha/surface_networks_covers
2 Previous work
Applying deep learning techniques to geometric data has proved to be a huge success in the last few years. A wide variety of methods were suggested, where the most popular approaches are: volumetric based methods (e.g., [49, 29]), rendering based methods (e.g., [42, 48, 51]), spectral based methods (e.g., [5, 10]) and methods that operate directly on the surface itself (e.g., [28]). A popular related problem is the problem of learning on point clouds which received a lot of attention lately (see e.g., [36, 2, 25]).
Here, we restrict our attention to intrinsic or parameterization-based surface methods and refer the reader to the above mentioned works and a recent survey [4] for further information.
Local parameterization. Such methods (e.g., [28, 3, 30]) extract local surface patches and use them in order to learn point representations. In [28] the authors use local polar coordinates as the patch operator. In a follow-up work, [3] use projections on oriented anisotropic diffusion kernels, where [30] learn the patch operator using a Gaussian mixture model. In contrast to these works, we employ a global parameterization which represents the shape using a single image.
Global parameterization. Other methods use global parametrization of the surface to a canonical domain. [40, 41] use an area-preserving parameterization and map surfaces to a planar domain (going through a sphere); the global area-preserving parameterization cannot cover the surface with low distortion everywhere and depends on the specific cut made on the surface.
The most similar method to ours is [27] that proposes gluing four copies of the surface into a torus and map it conformally (i.e., preserving angles) to a flat torus, where the convolution is well defined. Their map is defined by a choice of three points on the surface, and suffers from significant angle and scale distortion, see Figure 1b (e.g., the head, right arm and torso). In order to cover each point on the surface reasonably well, the authors sample multiple triplets of points from each surface where each triplet focuses on a different part of the surface. In a follow up work, [15] use the same parameterization as a surface representation for Generative Adversarial Networks (GANs) [13]. In order to deal with the high distortion of each single parameterization, the authors devise a multi-chart structure and rely on given sparse correspondences between the surfaces.
Convolutions on tangent planes. [46] define convolutions on surfaces by working on the tangent planes. [31] also define the convolutions on tangent planes and relate convolutions on nearby points using parallel transport. [34] define convolutions on surfaces by extending the notion of a signal on a surface into a directional signal and build layers that are equivariant to the choice of reference directions. [17] utilizes 4-rotational symmetric field to define a domain for convolution on a surface.
Convolutions of spherical signals. Our work targets learning of general genus zero surfaces. In particular, it can facilitate learning of spherical signals, a task that has received growing interest in the last few years. [43, 9, 52] note that an equirectangular projection of a spherical signal suffers from large distortions and suggest network architectures that try to compensate for these distortions. [6] perform 2D convolution on spherical strips extracted from the spherical signal. [19] suggest to define the convolution of a spherical signal as a linear combination of differential operators with learned weights. In a different line of work, [7, 12, 23] propose networks that are invariant to the natural action of on spherical signals. [8] advocate the notion of gauge equivariance as the correct equivariance notion on manifolds, and construct gauge equivariant networks on spheres.
Other methods. [50] tackle the shape segmentation problem by a novel architecture that operates on local features (such as normals) and global features (such as distances) and then fuses them together. [24] propose an improved graph neural network model based on the Dirac operator.
3 Preliminaries
In this section we discuss our choice of periodic images (i.e. images with toric topology) and introduce branched covering maps, the main mathematical tool used in our approach.
3.1 Convolutions on flattened spheres
A standard way to apply CNNs to a signal on a sphere-type surface is to represent it as an image and apply standard 2D convolution. Since representing a sphere as an image requires cutting and duplicating the cuts, different boundary segments in the image represent the same segment on the sphere.
In the case where the transformation in the image domain between the two duplicated boundary segments is a pure translation then the result of applying 2D convolution at any two matching points on these segments will result in exactly the same value. In other cases, such as equirectangular spherical projection [43] or octahedron spherical projection [35, 40], 2D convolution on two matching points result in two different values. Figure 2 shows an example where duplicated image boundary segments are marked with the same color arrows; a pair of matching points (marked ) are shown in each example along with an illustration of a convolution kernel. Note that only in the toric topology the kernel is consistent at the duplicated points. A similar point of view for toric images was suggested in [27]. We extend it to a more general family for toric images of sphere-type surfaces.
3.2 Branched covering maps
This section provides a brief introduction to branched covering maps (for more details see [16]). We start with a formal definition:
Definition 1**.**
Let and be topological spaces. A map is a branched covering map if every point except for a finite set of points has a neighborhood , such that is a disjoint union of homeomorphic 111A homeomorphism is continuous map with a continuous inverse. copies of .
The set of points are called branch points.
A simple example for a branched covering map is , for , and for some integer . The function has one branch point at . Every point , has distinct pre-images . However, the point has a single pre-image . We say that the point has pre-images located at [math], or that [math] is a pre-image with multiplicity . The ramification index of over is the multiplicity at , namely for all and for . We denote it as . Figure 3b shows this example for . In fact, this example captures all the local behaviors of covering maps: around a point with the map looks like the map .
Let us give another example: Consider the function . It has a branch point at with two distinct pre-images. Namely, . Here, the ramification index of [math] over is and the ramification index of over is . We say that the ramification structure of [math] is , formally:
Definition 2**.**
Let be a branched covering map, a branch point and , the number of pre-images of . The ramification structure of is the multi-set of ramification indices of its pre-images, denoted by . The ramification type of is the collection of its ramification structures, .
Figure 3a depicts a branch point with three distinct pre-images, , and ramification structure . Note that the ramification structure of a non-branch point is a trivial multi-set of ones: , see e.g., the red dot in Figure 3a.
The sum of the ramification indices of any point in is independent of the choice of the point (see [11], page 44 Proposition 7), namely
[TABLE]
Lastly, is called the degree of the covering. Intuitively, the degree of the covering counts how many times covers , or alternatively how many copies of can be found in .
3.3 Riemann-Hurwitz formula
A key fact about ramification types of branched covering maps between (boundaryless) surfaces is the Riemann-Hurwitz formula (RH), which connects the genus (i.e., number of handles) of the surfaces with the ramification type. In our case, we map a torus to a sphere-type surface and get the corresponding RH formula:
[TABLE]
A quick derivation of this formula is given in Section E.1.
Therefore, the RH formula sets a necessary condition on the possible ramification types of such branched covering maps. For example, the ramification type satisfies the RH equations but the ramification type does not (in this case , , ), implying that there is no covering map with this ramification type. We note that Equations (1) and (2) are necessary but not sufficient conditions.
4 Approach
Our goal is transferring signals (i.e., functions) from a sphere-type surface to the image domain (i.e., the flat torus: unit square with opposite ends identified). This is done by constructing a branched covering map
[TABLE]
and pulling back the signals to the image using . That is, given a signal that we want to transfer, the value of a pixel is set to . We represent the surface using a triangular mesh.
We build the covering map in two steps, as a composition of two functions:
[TABLE]
where is a torus-type surface built out of copies of , is a branched covering map, and is a homeomorphism between the two tori and (see Figure 4 for illustration).
4.1 Computing the branched covering map
In this section we describe how we construct the mesh out of the mesh and the branched covering map . The idea is to cut and glue together several copies of the input surface in a way that generates a toric covering space corresponding to a specific choice of .
First, we choose branch points from the set of vertices of (using farthest point sampling), a degree and a valid ramification type satisfying Equations (1)-(2). Our algorithm then consists of the following steps:
Step (i): We cut the mesh along disjoint paths, all emanating from the same (arbitrary) vertex in and ending at the branch points for . Figure 5 shows this for . Topologically, is a disk, with all branch points at its boundary.
Step (ii): is then duplicated times, to form copies . Figure 5 shows the copies with as a white dot and the branch points as colored dots.
Step (iii): We glue the copies of to create the surface as follows. Consider a branch point ; it has copies located in each of the copies of , see e.g., the blue dots in Figure 5. Denote by and the two boundary edges emanating from the -th copy of . Note that on the original surface is glued to ; since every is a duplicate of , can be glued to any , . Therefore, to describe the gluing of the edges emanating from we use a permutation (a permutation is a bijection ): is glued to . The collection of all permutations (one permutation per branch point)
[TABLE]
is called the gluing instructions. Given gluing instructions we use it to stitch the boundary of the copies of to construct the toric surface (i.e., genus one). The mapping is then defined by: map to its original version in , and extend linearly in each triangle (i.e., face) of . is a well defined branched covering map. The gluing procedure is summarized in Algorithm 1. In Subsection 4.1.1 we describe the algorithm for computing the gluing instructions given the desired ramification type .
4.1.1 Computing the gluing instructions
In this paper we limit our attention to ramification types of the form
[TABLE]
where is the cover degree, is the number of branch points, and is the maximal multiplicity of the branch points’ pre-images. The motivation in choosing these ramification types is two-fold: First, we want all branch points to be treated equally by the cover. Second, applying higher ramification order improves area distortion of protruding parts (see e.g., [21]); See Figure 1 and Subsection 4.3 for an example.
First, let us compute necessary conditions for defined in (5) to be a feasible ramification type. Equation (1) is automatically satisfied since . Plugging in (2) we get
[TABLE]
This sets a trade-off between and : higher values of , while reducing distortion of protruding parts would force higher degree of the cover, which will produce more copies of in the image. Practically, we found that and are both good options that strike a good balance between and .
To compute gluing instructions we start with satisfying (6). The next theorem (proved in Section E.2)provides a necessary and sufficient condition for the gluing instructions to furnish a cover with ramification type :
Theorem 1**.**
A set of gluing instructions yields a branched covering map with ramification type if and only if the following conditions hold:
- (i)
The cycle structure of equals the ramification structure of , i.e.**, . 2. (ii)
* is a product one tuple. That is, .* 3. (iii)
The group generated by is a transitive subgroup of . Namely, for each there exists so that .
Theorem 1 indicates that we should search for permutations with prescribed cycle structures. That is, the permutations , if exist, are in some prescribed conjugacy classes of the permutation group. Algorithm 2 performs such a search, more or less exhaustively, using conditions (ii) and (iii) to prune options that cannot lead to a solution .
Since theoretically not all satisfying Equation (6) have a corresponding covering map, Algorithm 2 can terminate without finding gluing instructions. In this case, according to Theorem 1 we know that there is no covering map with ramification type . Nevertheless, it is rare to find such examples in practice and indeed we did not encounter such a case in our experiments. Table 4 contains the results of Algorithm 2 for any permissible with so that they can be used as input to Algorithm 1.
4.2 Flattening the toric surface
The last part of our covering map computation is the computation of the map . Equivalently, we compute . To that end we use a version of the Orbifold-Tutte embedding [1]. We first cut along the two generating loops of the torus (using [20], Algorithm 5) to get a disk-type surface . Second, we compute a bijective piecewise affine map by solving a sparse linear system of equations , where and , and is the number of vertices in the disk-like mesh . This system is a discrete version of the Poisson equation [26], see Section G for details on how to construct . We use to map the vertices of to and extend linearly to get the piecewise affine map .
The resulting map is discrete harmonic [26], approximately conformal up to a linear transformation, and as proven in [1], a bijection.
4.3 Example
Figure 1a depicts the case , , . Thus ; every branch point has three distinct pre-images, where two have ramification one, and one with order- ramification. The gluing instructions in this case, computed using Algorithm 2, are:
[TABLE]
Note that each of these permutations has a cycle structure as required in Theorem 1 (i); conditions (ii)-(iii) can be checked as well. These gluing instructions were used to glue the copies of (as shown in Figure 5 and described in Algorithm 1) to generate the representation shown in Figure 1a.
5 Experiments
To evaluate the efficacy of our method we tested it in two main scenarios: learning signals on the sphere, and learning sphere-type surface data.
5.1 Evaluation
In this section we compare the geometric properties of our representation to standard or existing techniques. Figure 6 shows the area and scale distortion of our method (right, in blue) and two other popular methods for sphere flattening: Equirectangular projection (see e.g., [43]) and octahedron unfolding projection, see [35]. Area distortion is computed as the determinant of the differential of the cover map (treated as affine over each triangle of ), and angle distortion is the condition number of the differential. Since our image representation contains several copies of each triangle of we use the least distorted one for the histogram, as we want each part of the surface to appear in the image at-least once with low distortion. As can be seen in Figure 6, our projection has better angle preservation with only a mild sacrifice to area distortion.
In Figure 7 we repeat this experiment with a sphere-type model of a human and compare the area and angle distortion of five different types of image representations. While the method of [40] (leftmost, in red) preserves area better, it suffers from significant angle distortion. The orbifold covering of [27] (second to the left, in red) is angle-preserving, but suffers from notable area shrinking. Our covering maps (green and blue) strike a balance between angle and area preservation. The covering of type (middle, in blue) has the least area distortion and we chose it for the segmentation task (below).
The top row of Figure 7 compares the different image representations by reconstructing the original model. Specifically, for each vertex of the mesh we sampled its coordinates directly from the image at the vertex location (we used images here). In our representation, we take the coordinates from the vertex copy with the least area distortion. Note that the image representations of [40] and [27] do not represent well significant parts of the surface (e.g., the right leg and the head).
5.2 3D shape retrieval
The first application of our method is 3D shape retrieval. We use the SHREC2017 benchmark [39] that contains 3D models from different categories. There are two separate challenges: (i) the shapes are consistently aligned (ii) the shapes are randomly rotated. We tackle the (harder) second challenge.
Since the shapes are not of genus zero we follow the protocol of [7] that project the meshes on a bounding sphere using ray casting, and record six functions on this sphere: distance to the model, of the model angles (this is done for both the model and its convex-hull). We then use our method to transfer these six spherical signals to periodic images (flat torus). See Figure 8 for an example of such shape representation.
We compare our method to the top methods in each category, the Spherical CNN method [7], and the recent SO-3 equivariant networks suggested in [12]. The results are summarized in Table 1; note that in the F1 measure we score first among all methods.
For this application we use a slight modification of the inception v3 architecture [45]. We train the network with ADAM optimizer [22] for epochs with learning rate , batch size of , and learning rate decay of . Training took minutes per epoch on a Tesla V100 Nvidia GPU. In evaluation time we average the output of the network on randomly rotated copies of the query model.
5.3 Surface classification
We apply our method to the ModelNet40 surface classification benchmark [49] that contains 3D models from 40 different categories. As in the shape retrieval task, we follow the protocol of [7] to generate input signals on a sphere. We then use our method to represent the spherical signals as periodic images and apply the same inception v3 model as in the shape retrieval task. We present peak performance results (following [19]) for two scenarios that are popular in the literature: (i) the shapes are rotated randomly about the axis; and (ii) the shapes are learned in their original orientation. We train the network with ADAM optimizer [22] for epochs for scenario (1) and for scenario (ii) with learning rate , batch size , and learning rate decay . Training took minutes per epoch for the first scenario (that contains rotation augmentations) and minutes per epoch for the second scenario on a Tesla V100 Nvidia GPU.
Table 2 compares our results with several recent methods including the baselines of equirectangular projection (e.g., [43]) and octahedron unfolding projection [35]. Our results are the best among all spherical learning methods.
5.4 Surface segmentation
While our first two application targeted spherical signals, our last applications learns signals defined on general sphere-type human models. In particular, we perform human model semantic segmentation. We use the benchmark from [27] that consists of 373 train models from multiple sources and 18 test models. randomly sampled train models were used as a validation set (18 models). All models are given as triangular meshes. For each model, each face is labeled according to a predefined partition of the human body (e.g., head, torso, hands, total of labels). The task is to label the triangles of a new unseen human model with these labels. For each model we generate an augmented set of images per mesh, by permuting the order of the branch points, multiplying the vertices by a random orthogonal matrix and a uniform scale sampled from as suggested in [34], and small periodic image translations of pixels. In evaluation, as the toric image contains values for each triangle on the original mesh, we use the label of the triangle with the largest area. Furthermore, we use 10 random augmentations of test images and label each mesh face using a majority vote. Table 3 summarizes the results of this experiment, where our method outperformed previous methods; Figure 9 shows typical segmentation results.
For this application we used the U-net architecture [38] with layers (see Table 5 for details). We used a weighted loss with equal probability labels, and trained the network using stochastic gradient descent with momentum [44] for epochs with learning rate , batch size , and learning rate decay of . Training takes hours per epoch on a Tesla V100 Nvidia GPU.
6 Conclusions
In this paper, we introduce a new method for representing sphere-type surfaces as toric images that can be used in standard Convolutional Neural Network frameworks for shape learning tasks. The method allows faithful representation of all parts of the surface in a single image, thus alleviating the need to generate multiple maps to cover each surface. Our method is general and can target both spherical signal learning tasks as well as more general learning tasks that involve signals on different genus zero surfaces. Practically, we showed that off-the-shelf CNN models applied to images generated with our method lead to state of the art performance in the tasks of shape retrieval, shape classification and surface segmentation.
The main limitation of this work is its restriction to genus-zero surfaces. This kind of models are abundant, but certainly do not exhaust all 3D models. We would like to seek a generalization of this method to point clouds, depth images and more general topological types.
7 Acknowledgements
This research was supported in part by the European Research Council (ERC Consolidator Grant, ”LiftMatch” 771136) and the Israel Science Foundation (Grant No. 1830/17).
Appendix A Convolution on a spherical mesh
In Figure 10 we depict a cover map from the torus (texture square image on the left) to a human surface (middle); this map covers the human times. We further show how standard convolution stencil (in yellow) translates to a seamless convolution on the surface. Note that the texture seams on the human models are pretty arbitrary and just indicate when moving to a different copy of the surface.
Appendix B Guidelines on Choosing parameters
Adding branch points helps reducing the local distortion in protruding parts, therefore we recommend to choose as many branch points as there are protruding parts common in the dataset (e.g. for humans, for octopuses etc.). As we mentioned in section 4.1.1 we choose a ramification type of the form for each branch point.
As noted in Section 4.1.1, higher ramification () also improves area distortion of protruding parts. However, in that case, we are limited by the RH formula (Equation 6). So we would recommend choosing the highest possible (e.g. as appears in Table 4) and taking (number of copies) to satisfy Equation 6. Also note that higher implies higher (number of copies). Therefore, for a fixed image resolution we would like the highest number of branch points for which all relevant parts are still visible in the image.
Appendix C Gluing Instructions
As mentioned in Section 4.1.1, for each choice of number of branch points , degree and ramification type satisfying Equations (5) and (6) We need to compute a product one tuple of permutations satisfying the conditions of theorem 1. We note that this computation can be done in an offline step, before using Algorithm 1 to compute the toric parameterization. In Table 4 We provide gluing instructions corresponding to each valid choice of , and that complies with Equations (5) and (6). Each of the gluing instructions in Table 4 can be used as input to Algorithm (1).
Appendix D Implementation Details
Learning.
We use Pytorch [32] for learning. All the experiments are done with toric images generated by our algorithm and off-the-shelf CNN architectures with a single change: we replace the standard zero padding with periodic padding.
Data generation.
For the surface segmentation task we use a cover of the type , that is, . For the spherical learning tasks (shape retrieval and classification) we use a cover of type . The locations of the branch points are chosen using farthest points sampling. We use the shortest paths from an arbitrary base point to all branch points in order to cut the mesh. When the mesh does not allow such a path we subdivide it locally (without changing its geometry). This pre-processing step is implemented in Matlab. It takes seconds in average (relatively long running time due to a non-optimized mesh cutting code in Matlab) to generate a periodic (toric) image for a mesh with vertices on a single CPU core in an Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz machine.
D.1 Segmentation Task
Prediction.
The network outputs per-pixel labels. In order to obtain a label for each face in the original mesh , we first transfer the per-pixel logits to the faces of the toric mesh using bilinear interpolation sampled at the faces’ centers. Since each face in has duplicated faces in the toric mesh (), each face in has sets of logits. We use a weighted average of the sets of logits, where the weights are the area scales of the faces . The label of is the argmax of this weighted-average of logits. This means that better scaled faces (in the toric mesh) receive more weight when deciding how to label a face in the original mesh .
Architecture.
We use a version of a U-Net [38]. The feature-channels sizes are given in Table 5. After each convolution we use with a Batch-Normalization layer [18]. Each UpSample layer is a nearest-neighbour interpolation with scale-factor 2.
Appendix E Proofs
E.1 Riemann-Hurwitz formula
Consider a branched covering map of degree and branch points, from a toric mesh to a spherical mesh . We prove that the ramification type of must satisfy the Riemann-Hurwitz formula (9).
Proof of Riemann-Hurwitz formula.
First, we note that the set of branch points can always be chosen from .
Every node has pre-images in . However, a branch point has pre-images in . Every edge has exactly pre-images in , that is . Similarly, .
By computing the Euler characteristic for a toric surface:
[TABLE]
Using
[TABLE]
and rewriting we obtain the Riemann-Hurwitz formula (RH), in its version for a map from a toric surface to a spherical surface:
[TABLE]
∎
E.2 Proof of Theorem 1
We recall the following topological facts. A degree branched covering map from a torus to a genus [math] surface induces a group homomorphism, called the monodromy representation, from the fundamental group of to .
The homomorphism is given as follows: We take each loop based at a point , and lift it to starting from a preimage of . This lift has to end at another preimage of . Due to properties of the lifting, this induces a permutation on the preimages of in , referred to as the fiber of .
The group has generators and a single relation. The generators, are the loops around each of the branch points. The relation is .
Our gluing instructions, , will be the images of under the monodromy representation. We shall now give a proof of Theorem 1 . Namely, that our algorithm produces a cover with ramification if and only if the gluing instructions are a tuple of permutations satisfying the conditions of Theorem 1.
Proof of Theorem 1.
First we prove that the conditions in the theorem are necessary.
For , we note that a lift of a loop around a branch point with a particular ramification structure induces a permutation with the same cycle structure.
For , the fact that implies (using group homomorphism) that
For fix in the fiber of Since is connected, there exists a path connecting and . The loop is a loop starting and ending at whose lift takes to . Thus, the action of group generated by is transitive.
Conversely, suppose we have a product one tuple satisfying the conditions of the theorem and branch points . Then condition (i) allows us to define an action of the group on . Following the construction in [16] pg 68-70 the space is a covering space of , where is the universal cover of . The transitivity of implies that this covering space is connected. Condition implies by the Riemann-Hurwitz formula that is topologically a torus.
Let be the space produced from Algorithm 1. Note that the construction in Algorithm 1 implies that lifting a loop circling each branch point induces the permutation on the fiber of a generic point. Thus, the action of on coincides with the action of on . Since every action of on (up to conjugation) produces a unique (up to homeomorphism) covering space, we deduce that is homeomorphic to .
∎
Comment:
The equivalence between branched covering maps and tuples of permutations satisfying the conditions of Theorem 1 is well known. This equivalence is commonly referred to as Riemann’s existence theorem (RET). However, to the best of our knowledge, it was previously not known how to practically construct any given branched covering map (our Algorithm 1).
Appendix F Gluing Instructions
We now turn to describing an algorithm that finds tuples of permutations corresponding to a prescribed ramification structure , up to simultaneous conjugation (relabeling of the branch points). We call such a tuple a product one tuple. We implement our algorithm using Magma computational algebra system [14].
We denote the conjugacy class in associated with the cycle structure of by . In the algorithm construction we use the following:
Claim 1**.**
* is a transitive permutation group and if and only if , where is a transitive product one tuple with .* 2. 2.
The set can be completed to a transitive product one tuple compatible with a ramification structure if and only if , for any (* denotes the centralizer), can be completed to a transitive product one tuple compatible with .*
Proof.
(1) follows from the observations that adding elements to a transitive generator set keeps the set transitive, and that for the cycle structure of and are the same. For (2), note that for any and it holds that . Thus, for any , we have that any tuple with is the same as a tuple with , up to simultaneous conjugation. ∎
The main idea in the algorithm for finding all gluing instructions corresponding to a ramification type is to exhaustively go over all tuples and check whether they form a product one tuple. We use the claim above to prune this exhaustive search, as described in Algorithm 2. Note that this computation is done once for a given cover ramification type and is reused for all models using this type of cover.
Appendix G Orbifold-Tutte embedding of
We compute by solving a sparse linear system following [1]:
[TABLE]
Here and , where is the number of vertices in the disk-like mesh . The linear system (10) is constructed by putting together four sets of linear equations as follows:
First, for all interior vertices we set the discrete harmonic equation:
[TABLE]
where is the set of vertices in adjacent to and are the cotangent weights [33].
Let and be the generators of the homotopy group of . Denote by the intersection of the two loops and . In , the vertex has four copies . next, we ensure that these four copies are mapped to the four corners of the unit square . Explicitly,
[TABLE]
Each vertex has a twin vertex such that and correspond to the same vertex in the uncut mesh . Moreover, each such vertex has its origin in either in or in .
We set the vertices whose origin is in to be different by a constant translation in and the vertices whose origin is in to be different by a constant translation in . Namely:
[TABLE]
where and are twins, and is either or , depending on whether the origin of belongs to or .
Finally we set each vertex to be the weighted average of both its neighbors and the translated neighbors of its twin.
[TABLE]
with as before.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Noam Aigerman and Yaron Lipman. Orbifold tutte embeddings. ACM Trans. Graph. , 34(6):190–1, 2015.
- 2[2] Matan Atzmon, Haggai Maron, and Yaron Lipman. Point convolutional neural networks by extension operators. ACM Trans. Graph. , 37(4):71:1–71:12, July 2018.
- 3[3] Davide Boscaini, Jonathan Masci, Emanuele Rodolà, and Michael Bronstein. Learning shape correspondence with anisotropic convolutional neural networks. In Advances in Neural Information Processing Systems , pages 3189–3197, 2016.
- 4[4] Michael M Bronstein, Joan Bruna, Yann Le Cun, Arthur Szlam, and Pierre Vandergheynst. Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine , 34(4):18–42, 2017.
- 5[5] Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann Le Cun. Spectral networks and locally connected networks on graphs. ar Xiv preprint ar Xiv:1312.6203 , 2013.
- 6[6] Zhangjie Cao, Qixing Huang, and Ramani Karthik. 3d object classification via spherical projections. In 2017 International Conference on 3D Vision (3DV) , pages 566–574. IEEE, 2017.
- 7[7] Taco S Cohen, Mario Geiger, Jonas Köhler, and Max Welling. Spherical cnns. ar Xiv preprint ar Xiv:1801.10130 , 2018.
- 8[8] Taco S Cohen, Maurice Weiler, Berkay Kicanaoglu, and Max Welling. Gauge equivariant convolutional networks and the icosahedral cnn. ar Xiv preprint ar Xiv:1902.04615 , 2019.
