DDSL: Deep Differentiable Simplex Layer for Learning Geometric Signals

Chiyu "Max" Jiang; Dana Lynn Ona Lansigan; Philip Marcus; Matthias; Nie{\ss}ner

arXiv:1901.11082·cs.CV·August 16, 2019

DDSL: Deep Differentiable Simplex Layer for Learning Geometric Signals

Chiyu "Max" Jiang, Dana Lynn Ona Lansigan, Philip Marcus, Matthias, Nie{\ss}ner

PDF

1 Repo

TL;DR

The paper introduces DDSL, a differentiable layer that bridges geometric mesh representations with raster images, enabling advanced shape optimization and end-to-end training in neural networks.

Contribution

It presents a novel, generalizable differentiable rasterization layer for simplex meshes, with a complete theoretical framework and efficient backpropagation, applicable to various geometric deep learning tasks.

Findings

01

Effective gradient-based shape optimization demonstrated with airfoil design.

02

Surpassed state-of-the-art in polygonal image segmentation using DDSL.

03

Generalizes to arbitrary simplex degrees and dimensions.

Abstract

We present a Deep Differentiable Simplex Layer (DDSL) for neural networks for geometric deep learning. The DDSL is a differentiable layer compatible with deep neural networks for bridging simplex mesh-based geometry representations (point clouds, line mesh, triangular mesh, tetrahedral mesh) with raster images (e.g., 2D/3D grids). The DDSL uses Non-Uniform Fourier Transform (NUFT) to perform differentiable, efficient, anti-aliased rasterization of simplex-based signals. We present a complete theoretical framework for the process as well as an efficient backpropagation algorithm. Compared to previous differentiable renderers and rasterizers, the DDSL generalizes to arbitrary simplex degrees and dimensions. In particular, we explore its applications to 2D shapes and illustrate two applications of this method: (1) mesh editing and optimization guided by neural network outputs, and (2)…

Tables7

Table 1. Table 1 : List of math symbols in our method.

Notation	Description
$d$	Dimension of Euclidean space $ℝ^{d}$
$j$	Degree of simplex. Point $j = 0$ , Line $j = 1$ , Tri. $j = 2$ , Tet. $j = 3$
$n, N$	Index of the $n$ -th element among a total of $N$ elements
$Ω_{n}^{j}$	Domain of $n$ -th element of order $j$
$𝒙$	Cartesian space coordinate vector. $𝒙 = (x, y, z)$
$𝒌$	Spectral domain coordinate vector. $𝒌 = (u, v, w)$
$p$	Index of a point in a simplex element. $p \in ℕ$ , $p \leq j + 1$
$i$	Imaginary number unit

Table 2. Table 2 : Comparison of Cityscape image segmentation IoU against baseline algorithms on test set.

Model	Bicycle	Bus	Person	Train	Truck	Motorcycle	Car	Rider	Mean
SquareBox [4]	35.41	53.44	26.36	39.34	54.75	39.47	46.04	26.09	40.11
Dilation10 [47]	46.80	48.35	49.37	44.18	35.71	26.97	61.49	38.21	43.89
DeepMask [33]	47.19	69.82	47.93	62.20	63.15	47.47	61.64	52.20	56.45
SharpMask [34]	52.08	73.02	53.63	64.06	65.49	51.92	65.17	56.32	60.21
Polygon-RNN [4]	52.13	69.53	63.94	53.74	68.03	52.07	71.17	60.58	61.40
Polygon-RNN++ [1]	63.06	81.38	72.41	64.28	78.90	62.01	79.08	69.95	71.38
PolygonNet (Ours)	62.26	84.38	68.62	82.42	76.57	63.57	78.08	64.10	72.50

Table 3. Table 3 : Comparison of network parameters and evaluation time for a batch of 16 image crops.

Model	# Params	Runtime (s)
Polygon-RNN	58M	$2.0332 \pm 0.0168$
Polygon-RNN++	100M	$2.3241 \pm 0.0181$
PolygonNet (Ours)	24M	$0.0287 \pm 0.0022$

Table 4. Table 4 : Network architecture notation list.

Notation	Meaning
Conv(a, b, c, d)	Convolutional layer with $a$ input channels, $b$ output channels, kernel size $c$ , and stride $d$ .
MaxPool(a)	Maximum Pooling with a kernel size of $a$ .
ReLU	Rectified Linear Unit activation function.
FC(a, b)	Fully connected layer with $a$ input channels and $b$ output channels.
ResNet-50(a)	ResNet-50 architecture with $a$ output channels.
BN	Batch Normalization.

Table 5. Table 5 : 2D Computational speed (polygon w/ 250 edges).

Res²	16	32	64	128	256
Fwd Time (ms)	2.30	1.88	2.48	5.02	20.13
Bwd Time (ms)	4.33	3.80	5.93	16.69	59.15

Table 6. Table 6 : 3D Computational speed (tri-mesh w/ 1300 faces).

Res³	4	8	16	32
Fwd Time (ms)	9.88	9.32	14.21	78.62
Bwd Time (ms)	14.47	10.06	34.26	239.51

Table 7. Table 7 : Evaluation resultsn( × 10 − 2 absent superscript 10 2 \times 10^{-}2 ).

DDSL	Accuracy	Complete	Chamfer
w/o	8.47	9.84	9.16
w/	2.15	1.83	1.99

Equations81

C

C

= {θ_{0} v_{0} + \dots + θ_{j} v_{j} ∣ θ ⪰ 0, 1^{T} θ = 1}

f_{n}^{j} (x) = {ρ_{n}, x \in Ω_{n}^{j} 0, x \in / Ω_{n}^{j}, f^{j} (x) = n = 1 \sum N f_{n}^{j} (x)

f_{n}^{j} (x) = {ρ_{n}, x \in Ω_{n}^{j} 0, x \in / Ω_{n}^{j}, f^{j} (x) = n = 1 \sum N f_{n}^{j} (x)

F_{n}^{j} (k) = ρ_{n} i^{j} γ_{n}^{j} S

F_{n}^{j} (k) = ρ_{n} i^{j} γ_{n}^{j} S

S := t = 1 \sum j + 1 \frac{e ^{- i σ_{t}}}{\prod _{l = 1, l \neq = t}^{j + 1} ( σ _{t} - σ _{i} )}, σ_{t} := k \cdot x_{t}

S := t = 1 \sum j + 1 \frac{e ^{- i σ_{t}}}{\prod _{l = 1, l \neq = t}^{j + 1} ( σ _{t} - σ _{i} )}, σ_{t} := k \cdot x_{t}

C_{n}^{j}

C_{n}^{j}

\hat{B}_{n}^{j}

γ_{n}^{j} = \frac{C _{n}^{j}}{C _{I}^{j}} = j! C_{n}^{j}

γ_{n}^{j} = \frac{C _{n}^{j}}{C _{I}^{j}} = j! C_{n}^{j}

F^{j} (k) = n = 1 \sum N F_{n}^{j} (k) = n \sum N ρ_{n} i^{j} γ_{n}^{j} S

F^{j} (k) = n = 1 \sum N F_{n}^{j} (k) = n \sum N ρ_{n} i^{j} γ_{n}^{j} S

F_{n}^{j} (k) =

F_{n}^{j} (k) =

+ t = 1 \sum j \frac{e ^{- i σ_{t}}}{σ _{t} \prod _{l = 1, l \neq = t}^{j} ( σ _{t} - σ _{l} )})

s_{n^{'}} γ_{n^{'}}^{j} = j! det (J) = j! det ([x_{1}, x_{2}, \dots, x_{j}])

s_{n^{'}} γ_{n^{'}}^{j} = j! det (J) = j! det ([x_{1}, x_{2}, \dots, x_{j}])

\frac{\partial γ _{n}^{j}}{\partial x _{p}} = \frac{( - 1 ) ^{j + 1} / 2 ^{j}}{γ _{n}^{j}} m = 1 m \neq = p \sum j + 1 A_{p m} D_{p m}

\frac{\partial γ _{n}^{j}}{\partial x _{p}} = \frac{( - 1 ) ^{j + 1} / 2 ^{j}}{γ _{n}^{j}} m = 1 m \neq = p \sum j + 1 A_{p m} D_{p m}

S_{t} := \frac{e ^{- i σ_{t}}}{\prod _{l = 1, l \neq = t}^{j + 1} ( σ _{t} - σ _{l} )}

S_{t} := \frac{e ^{- i σ_{t}}}{\prod _{l = 1, l \neq = t}^{j + 1} ( σ _{t} - σ _{l} )}

\frac{\partial S}{\partial x _{p}} = - i S_{p} + t = 1, t \neq = p \sum j + 1 \frac{S _{t} + S _{p}}{σ _{t} - σ _{p}} k

\frac{\partial S}{\partial x _{p}} = - i S_{p} + t = 1, t \neq = p \sum j + 1 \frac{S _{t} + S _{p}}{σ _{t} - σ _{p}} k

\frac{\partial F _{n}^{j} ( k )}{\partial x _{p}} = ρ_{n} i^{j} Λ k + Γ m = 1 m \neq = p \sum j + 1 A_{p m} D_{p m}

\frac{\partial F _{n}^{j} ( k )}{\partial x _{p}} = ρ_{n} i^{j} Λ k + Γ m = 1 m \neq = p \sum j + 1 A_{p m} D_{p m}

Λ :=

Λ :=

Γ :=

L_{mres}

L_{mres}

i \in {0, 1, 2, 3}, r es \in {224, 112, 56, 28}

L_{smooth}

L

\frac{\partial γ _{n}^{j}}{\partial x _{p}}

\frac{\partial γ _{n}^{j}}{\partial x _{p}}

= \frac{( - 1 ) ^{j + 1} / 2 ^{j}}{2 γ _{n}^{j}} m = 1 \sum j + 2 n = 1 \sum j + 2 \tilde{A}_{mn} \tilde{D}_{nm}

\tilde{D} = 0 ⋮ 000 ⋮ \dots ⋱ \dots \dots \dots 0 ⋮ 0 \tilde{D}_{p + 1, p} 0 ⋮ 0 ⋮ \tilde{D}_{p, p + 1} 0 \tilde{D}_{p + 2, p + 2} ⋮ 0 ⋮ 0 \tilde{D}_{p + 1, p + 2} 0 ⋮ \dots \dots \dots \dots ⋱

\tilde{D} = 0 ⋮ 000 ⋮ \dots ⋱ \dots \dots \dots 0 ⋮ 0 \tilde{D}_{p + 1, p} 0 ⋮ 0 ⋮ \tilde{D}_{p, p + 1} 0 \tilde{D}_{p + 2, p + 2} ⋮ 0 ⋮ 0 \tilde{D}_{p + 1, p + 2} 0 ⋮ \dots \dots \dots \dots ⋱

\tilde{D}_{p + 1, n}

\tilde{D}_{p + 1, n}

\tilde{D}_{m, p + 1}

m = 1 \sum j + 2 n = 1 \sum j + 2 \tilde{A}_{mn} \tilde{D}_{nm} = 2 m = 2 m \neq = p + 1 \sum j + 2 \tilde{A}_{p + 1, m} \tilde{D}_{p + 1, m}

m = 1 \sum j + 2 n = 1 \sum j + 2 \tilde{A}_{mn} \tilde{D}_{nm} = 2 m = 2 m \neq = p + 1 \sum j + 2 \tilde{A}_{p + 1, m} \tilde{D}_{p + 1, m}

\frac{\partial γ _{n}^{j}}{\partial x _{p}} = \frac{( - 1 ) ^{j + 1} / 2 ^{j}}{γ _{n}^{j}} m = 1 m \neq = p \sum j + 1 A_{p m} D_{p m}

\frac{\partial γ _{n}^{j}}{\partial x _{p}} = \frac{( - 1 ) ^{j + 1} / 2 ^{j}}{γ _{n}^{j}} m = 1 m \neq = p \sum j + 1 A_{p m} D_{p m}

\frac{\partial S}{\partial x _{p}} = t = 1 \sum j + 1 \frac{\partial S _{t}}{\partial x _{p}}

\frac{\partial S}{\partial x _{p}} = t = 1 \sum j + 1 \frac{\partial S _{t}}{\partial x _{p}}

\frac{\partial S _{t}}{\partial x _{p}} =

\frac{\partial S _{t}}{\partial x _{p}} =

l = 1, l \neq = p \prod j + 1 (σ_{p} - σ_{l}) (- i e^{- i σ_{p}})

+ e^{- i σ_{p}} \frac{\partial}{\partial x _{p}} l = 1, l \neq = p \prod j + 1 (σ_{p} - σ_{l})

=

\frac{\partial S _{t}}{\partial x _{p}} =

\frac{\partial S _{t}}{\partial x _{p}} =

=

(\frac{\partial}{\partial x _{p}} (\frac{1}{σ _{t} - σ _{p}}))

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

maxjiang93/DDSL
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\newfloatcommand

capbtabboxtable[][\FBwidth]

DDSL: Deep Differentiable Simplex Layer for Learning Geometric Signals

Chiyu “Max” Jiang111Equal contributions 1

Dana Lansigan11footnotemark: 1 1

Philip Marcus1

Matthias Nießner2

1UC Berkeley

2Technical University of Munich

Abstract

We present a Deep Differentiable Simplex Layer (DDSL) for neural networks for geometric deep learning. The DDSL is a differentiable layer compatible with deep neural networks for bridging simplex mesh-based geometry representations (point clouds, line mesh, triangular mesh, tetrahedral mesh) with raster images (e.g., 2D/3D grids). The DDSL uses Non-Uniform Fourier Transform (NUFT) to perform differentiable, efficient, anti-aliased rasterization of simplex-based signals. We present a complete theoretical framework for the process as well as an efficient backpropagation algorithm. Compared to previous differentiable renderers and rasterizers, the DDSL generalizes to arbitrary simplex degrees and dimensions. In particular, we explore its applications to 2D shapes and illustrate two applications of this method: (1) mesh editing and optimization guided by neural network outputs, and (2) using DDSL for a differentiable rasterization loss to facilitate end-to-end training of polygon generators. We are able to validate the effectiveness of gradient-based shape optimization with the example of airfoil optimization, and using the differentiable rasterization loss to facilitate end-to-end training, we surpass state of the art for polygonal image segmentation given ground-truth bounding boxes.

1 Introduction

††* Equal contributions

The simplicial complex (i.e., simplex mesh) is a flexible and general representation for non-uniform geometric signals. Various commonly-used geometric representations, including point clouds, wire-frames, polygons, triangular mesh, tetrahedral mesh etc., are examples of simplicial complexes. Leveraging deep learning architectures for such non-uniform geometric signals has been of increasing interest, and varied methodologies and architectures have been presented to deal with varied representations [3].

In this study, we propose a Deep Differentiable Simplex Layer (DDSL), which performs differentiable rasterization of arbitrary simplex mesh-based geometric signals. The DDSL is based upon simplex Non-Uniform Fourier Transform (NUFT) [18] for the forward-pass, which is highly generalizable across arbitrary topologies. Furthermore, we find the general differential form of the simplex NUFT, allowing for an efficient backward pass. Our work differs from previous work in the literature on differentiable rendering in two major ways. First, our network is generalizable across arbitrary simplex degrees and dimensions, making it a unified framework for a range of geometric representations. Second, while other differentiable renderers are specifically posed for projective-rendering by projecting 3D meshes to 2D grids, the DDSL is capable of in-situ rasterization in the original dimension. Building on the differentiable nature of the rasterizer, we explore two unique use cases. First, using the differentiablity of the DDSL, we can utilize Convolutional Neural Network (CNN) based deep learning models as surrogate models of physical properties for shape optimization, which is useful in a range of engineering disciplines. Secondly, using the DDSL as a neural network layer, we can formulate a differentiable rasterization loss that allows for end-to-end generation of shapes using a direct supervised approach, which can be useful in a range of computer vision problems.

As an example of the two use cases, we perform three experiments. First, to validate the effectiveness of gradient propagation through the layer, we illustrate with the toy problem of MNIST shape optimization, where we can use gradients propagated through the neural network and DDSL to manipulate and transform the input polygon mesh into a target digit (Sec. 4.2). Next, to further illustrate potential applications of neural shape optimization enabled by the DDSL, we investigate the classic engineering problem of airfoil optimization and show that the shape optimization pipeline effectively manipulates the input shape into a desired lift-drag ratio (Sec. 4.2). Finally, to illustrate the effectiveness of the differentiable rasterization loss, we train a polygon generating neural network end-to-end with direct supervision to generate polygonal segmentation masks for image segmentation (Sec. 4.3). With the novel rasterization loss, we surpass state-of-the-art in the polygon segmentation task, with a much simpler network architecture and training scheme.

In summary, we contribute the following:

•

We propose the DDSL, which is a differentiable rasterizer for arbitrary simplex-mesh based geometries. Its differentiable nature allows for its effective integration in deep neural networks.

•

We show that the DDSL effectively facilitates shape optimization for engineering applications such as aerodynamic optimzation of airfoils, using neural networks as surrogate models.

•

We show that the DDSL can be used to produce a differentiable rasterization loss, which can be used to create direct supervision to facilitate end-to-end training of shape generators, with applications in polygonal segmentation mask generation.

•

We develop and release code for effectively integrating the DDSL into deep neural networks111Code available:

https://github.com/maxjiang93/DDSL

, with compelling computational performance benchmarks.

2 Related Work

We present a brief overview of geometric representations for deep learning, various related differentiable renderers, and related work in the space of our two exemplary applications.

Geometric Representations for Deep Learning

In general, there are two classes of geometric representations, either in its native form of simplex meshes, or in a raster form which can be efficiently processed with grid-based network architectures such as CNNs. As simplex meshes come in various forms and dimensions (point clouds, meshes etc.), there is a vast body of literature for different geometric signals of different simplex degrees and dimensions. For example, PointNets have been specially designed for point clouds [36, 37], various algorithms perform convolutions natively on the mesh manifold, [17, 15, 2], the graph [10, 24, 46] etc.

Grid-based algorithms on the other hand require the rasterization of a simplex-mesh based geometric signal for further processing by CNNs. Examples of such include binary-voxel based algorithms [32, 45], Truncated Signed-Distance Function (TSDF) based algorithms [7, 48, 40, 8], multi-view image based algorithms [41, 21], and hybrids [19, 6]. Compared to deep learning methods that directly perform convolutions on the simplex mesh, grid-based methods are more generalizable across shape topologies and computationally easier to implement, since it leverages highly efficient tensor operators such as 2D/3D convolution kernels for rasterized data. However, conventional voxelization methods are not differentiable with respect to the input mesh, and differentiable rasterizers have been proposed to close the gap between simplex and grid representations.

Differentiable Rasterization in Deep Learning

Recently, a series differentiable projective renderers have been proposed. [30] proposed an approximate differentiable rasterizer for inverse graphics. [22] proposed a deep neural renderer that uses linear approximations for the gradients of the pixel intensity with respect to the vertex positions. [26] introduced a differentiable ray-tracer for differentiability of additional rendering effects. Very recently, [28] proposed a differentiable rasterizer that approximates rendering derivatives with soft boundaries. Various studies in face mesh reconstruction applications [11, 42, 43, 38] and general mesh reconstruction tasks [20, 25] utilize some form of differentiable rasterization to facilitate gradient flows in neural networks.

Shape Optimization

Shape optimization is essential in a broad range of engineering fields, including aerodynamic, mechanical, structural, and architectural designs. Traditionally, shape optimization algorithms couple gradient-based or gradient-free optimizers (e.g., genetic algorithms, simulated annealing) with physics simulators, e.g., Computational Fluid Dynamics (CFD) and multiphysics software for evaluation. For aerodynamic shape optimization, the adjoint method has been used for gradient-based optimizations with sensitivities acquired from physics simulators [35, 16]. Recently, machine learning algorithms such as multilayer perceptrons have been used as surrogate models for the response surface to speed up evaluation and optimization [23, 31]. More recently, CNNs have been used for the evaluation of aerodynamic properties [49], and gradient-based optimization methods coupled with CNNs have been explored [14]. However, direct manipulation of input mesh has not been achieved due to the lack of in-situ differentiable rasterization of polygons and 3D meshes.

Image Segmentation with Polygon Masks

Image segmentation is a central task in computer vision, and has been thoroughly studied. Much of the work in the image segmentation literature creates pixel-level masks [29, 39, 44, 12, 9, 27]. However, more recently, to address the need of assisting human annotators to create ground-truth segmentation labels, new network architectures such as PolygonRNN [4] and PolygonRNN++ [1] have been proposed for creating polygonal segmentation masks given ground-truth bounding boxes. Our work targets this application to explore a more effective and efficient polygon generating network using our DDSL-enabled rasterization loss.

3 Method

3.1 DDSL Overview

A schematic of the DDSL layer is presented in Fig. 1. The DDSL layer consists of three consecutive mathematical operations, first computing the Fourier transform of the simplicial complex by uniformly sampling it in the spectral domain, followed by a spectral filtering step by multiplying the spectral signal with a Gaussian filter to eliminate ringing effects. Lastly, we use the inverse Fourier Transform (iFFT) to acquire the physical raster image corresponding to the input. Since the forward and backward methods of the filtering step (an element-wise product) and iFFT are well known, we focus our analysis on the simplex NUFT, which we derive and detail below.

3.2 Mathematical Description

We represent discrete geometric signals as weighted simplicial complexes. We provide the following definitions for a $j$ -simplex and a $j$ -simplex mesh:

Definition 3.1 ( $j$ -simplex).

A simplex is the generalization of the two-dimensional triangle in other dimensions. The $j$ -simplex determined by $j+1$ affinely independent points $v_{0},\dots,v_{j}\in\mathbb{R}^{n}$ is

[TABLE]

where $\bm{1}$ is the vector with all entries one.

Definition 3.2 ( $j$ -simplex mesh).

A simplicial complex consisting only of $j$ -simplices is a homogeneous simplicial $j$ -complex, or a $j$ -simplex mesh.

Example 3.1 (Examples of simplices and simplex meshes).

A [math]-simplex is a point, a $1$ -simplex is a line, a $2$ -simplex is a triangle, and a $3$ -simplex is a tetrahedron. The [math]-, $1$ -, $2$ -, and $3$ -simplicial complexes are the point cloud and linear, triangular, and tetrahedral meshes, respectively.

Definition 3.3 (Functions over a $j$ -simplex element and a $j$ -simplex mesh).

The Piecewise-Constant Function (PCF) over a $j$ -simplex mesh consisting of $N$ simplices is the superposition of the density functions $f_{n}^{j}(\bm{x})$ for each $j$ -simplex with domain $\Omega_{n}^{j}$ and signal density $\rho_{n}$ :

[TABLE]

For the forward pass, we use the NUFT of a PCF over a $j$ -simplex mesh.

Proposition 3.1 (Forward pass).

The NUFT of a PCF over a simplex in a mesh is

[TABLE]

where $\gamma_{n}^{j}$ is the content distortion factor, which is the ratio between the simplex content and the unit orthogonal simplex content. The simplex content $C_{n}^{j}$ is computed using the Cayley-Menger determinant:

[TABLE]

where each element $d_{st}^{2}$ of $\hat{B}_{n}^{j}$ is the squared distance between points $s$ and $t$ . The content of the unit orthogonal simplex $C_{I}^{j}$ is $1/j!$ , so the content distortion factor is

[TABLE]

From the linearity of the Fourier transform, the NUFT of a PCF over an entire $j$ -simplex mesh is

[TABLE]

For efficient computing, we use the auxiliary node method (AuxNode), which utilizes signed content.

Corollary 3.1 (AuxNode).

To compute the Fourier transform of uniform signals in $j$ -polytopes represented by its watertight $(j-1)$ -simplex mesh using AuxNode, Eqn. (3) is modified as follows:

[TABLE]

where $s_{n^{\prime}}\gamma_{n^{\prime}}^{j}$ is the signed content distortion factor for the $n^{\prime}$ th auxiliary $j$ -simplex where $s_{n^{\prime}}\in\{-1,1\}$ . For practical purposes, assume that the auxiliary $j$ -simplex is in $\mathbb{R}^{d}$ where $d=j$ . The signed content distortion factor is computed using the determinant of the Jacobian matrix for parameterizing the auxiliary simplex to a unit orthogonal simplex:

[TABLE]

Proof.

Refer to [18]. ∎

For the backward pass, we derive the analytic derivative of the NUFT with respect to the vertex coordinates of a j-simplex mesh. Following from the product rule, we require the derivatives of the content distortion factor $\gamma_{n}^{j}$ and the summation term $S$ to obtain the entire derivative of $F_{n}^{j}(\bm{k})$ .

Lemma 3.1 (Derivative of the content distortion factor).

The derivative of $\gamma_{n}^{j}$ with respect to vertex coordinate $\bm{x}_{p}$ is

[TABLE]

where $\bm{D}_{pm}=2(\bm{x}_{p}-\bm{x}_{m})$ and $A_{pm}$ is the element in the $(p+1)$ th row and $(m+1)$ th column of $adj(\hat{B_{n}^{j}})$ .

Lemma 3.2 (Derivative of the summation term).

Let $S_{t}$ be one term in the summation term $S$ :

[TABLE]

The derivative of the summation term with respect to $\bm{x}_{p}$ is

[TABLE]

where $\bm{k}$ is the spectral domain coordinate vector.

Proposition 3.2 (Backward pass).

Following from Lemmas 3.1 and 3.2, the derivative of $F_{n}^{j}(\bm{k})$ with respect to a point $\bm{x}_{p}$ in the simplex element $n$ is

[TABLE]

where $A_{pm}$ is the element in the $p$ th row and $m$ th column of $adj(\hat{B_{n}^{j}})$ starting at $p=0$ and $m=0$ ,

[TABLE]

We provide a detailed derivation of Eqn. 14 as well as proofs of Lemmas 3.1 and 3.2 in Sec. A1 of the Appendix.

3.3 Deep Learning Architectures and Pipelines

We present the a schematic of the deep learning model-driven shape optimization (Sec. 4.2) in Fig. 3, and a schematic of the polygon segmentation network (PolygonNet) in Figs. 3 and 4. A detailed description of the architectures is presented in Appendix B.

4 Experiments

4.1 Performance Benchmarking

We compare the runtime of our implementation of the backward pass over the DDSL with that of the numeric derivatives calculated using the finite difference method.

Experiment Setup

We perform tests for the 0-, 1-, 2-, and 3-simplex meshes in 3-dimensional space and examine the effects of mesh size (number of points in the mesh) and image resolution. We test mesh sizes ranging from 5 to 50 points and resolutions ranging from 4 to 32, and we run each test 100 times to acquire a distribution of data. For each run, we randomly generate a 3-dimensional simplex mesh of varied simplex degrees, varied densities, with random gradient values on each raster pixel. We then calculate the analytic and numeric derivatives for the DDSL using our implementation of Eqn. 14 and the finite difference method, respectively, and time each calculation.

Analysis of complexity

Since the analytic finite difference backward pass for computing the gradients using Eqn. 14 requires computing each pair of spectral coefficient and each vertex in a $j$ -simplex, the computational complexity for the finite difference backward pass is the same as the forward pass, $\mathcal{O}((j+1)n_{e}m)$ , for a mesh of $n_{e}$ simplices and a raster of $m$ degrees of freedom. Finite difference, on the other hand, requires $n_{v}$ forward computations, each of complexity $\mathcal{O}((j+1)n_{e}m)$ . Assuming $n_{v}\propto n_{e}$ , the Finite Difference evaluation is of complexity $\mathcal{O}((j+1)n_{e}^{2}m)$ .

Results

The results of our mesh size and resolution runtime tests are shown in Fig. 5. In both tests and for all $j$ -simplices, our implementation of the analytic derivative consistently outperforms the numerical method for calculating the derivative by $10\sim 100\times$ in the range we tested.

4.2 Shape Optimization

We demonstrate the utility of the DDSL through the task of shape optimization. Since many physical characteristics depend on shape, shape optimization is an important and challenging task across many fields of science and engineering. We show that the DDSL allows us to accomplish this shape optimization task due to the analytic nature of its derivative.

General Experiment Setup

We pre-process each shape into a polygon of the shape’s boundary. The polygons are rasterized using the DDSL. We train neural networks on the raster images, and we use the gradients out of these neural networks for the shape optimization task.

Using gradient descent, we optimize a shape to a prescribed target value, which can be a shape classification or a physical quantity. Since we implemented the DDSL as a differentiable neural network layer, we can obtain the gradient of the target value with respect to the original shape directly from the neural network. Rather than directly manipulating vertices, we further propagate this gradient to control points attached to the original shape for enhanced robustness. Each control point has 3 degrees of freedom: translation in the $x$ and $y$ directions, and rotation about the point. More details about the control points are given in Sec. A2. We iterate the shape optimization process until the loss converges to zero.

MNIST

We first demonstrate shape optimization using the DDSL with the MNIST dataset of handwritten digits. Rather than using the traditional pixel images, we use polygons of the digits as inputs. The polygon form of MNIST digits can be acquired by contouring the original images. The objective of this experiment is to optimize a digit in the MNIST dataset to a target digit.

Airfoils

We further illustrate the functionality of the DDSL with the more practical task of aerodynamic shape optimization. For this experiment, we optimize an airfoil to a prescribed lift-drag ratio, which is related to the efficiency of an aerodynamic body. We use the airfoiltools.com database of consisting of 1,636 airfoils of aircraft wings and turbine blades, along with precomputed physical quantities such as drag and lift coefficients at different angles of attack and Reynolds numbers, acquired from CFD simulations. Airfoils are originally represented as polygons and rasterized using the DDSL. We then train a neural network to predict lift-drag ratios of airfoils at specific angles of attack and Reynolds numbers and use this neural network for the shape optimization task. When optimizing the airfoil shape, we specify the angle of attack of the airfoil and the Reynolds number of the flow.

Results

We show some iterations of the shape optimization process for the MNIST and airfoil experiments as well as graphs showing the loss over each iteration in Figs. 6, respectively. The success of the DDSL in the shape optimization task is most intuitively clear in the MNIST experiment, where the original digit, ‘1,’ is transformed into a ‘3.’ In the airfoil experiment, the lift-drag ratio increased, as desired. The optimized shape is an airfoil with its trailing edge deflected downwards, resembling an aircraft deploying its flaps at takeoff to increase lift. Both experiments exhibit a monotonic decrease in loss, which converges to zero, confirming that optimization was achieved.

4.3 Segmentation Mask Generation

To further illustrate applications of the DDSL layer in deep learning applications, we experiment on the task of image segmentation by generating polygonal masks. In contrast to conventional segmentation frameworks that output pixel masks, directly predicting polygons allows for a more efficient and flexible output structure, and has been shown to be effective in assisting human annotators in labeling new datasets [4, 1].

Experiment Setup

For direct comparison with state-of-the-art, we follow the experiment setup of [4] and [1] for predicting polygonal masks. In contrast to the conventional setup of instance segmentation, we assume crops of input images given ground-truth bounding boxes, and we output the corresponding polygonal masks using our neural network. Following the two studies, we train and test our model on the Cityscapes dataset [5]. The Cityscapes dataset is one of the most comprehensive benchmarks for instance segmentation, containing 2975 training, 500 validation, and 1525 test images labeled with 8 semantic classes. We follow the two studies for an alternative split of the original dataset, since the original test images do not provide ground-truth instances. The new partitions consists of 40174 / 3448 / 8440 image crops of train/validation/test sets, each of size $224\times 224$ .

Training

We use two losses for training the model, a multi-resolution rasterization loss, and a smoothness loss. The losses are defined as:

[TABLE]

where $D_{res}$ is DDSL rasterization at resolution $res$ , $G_{\theta}^{(i)}$ is the polygon output from the polygon generator network parameterized by $\theta$ , up to level $i$ , $x$ and $y$ are the input images and the ground-truth polygons, $A_{j}$ is the $j$ -th angel of the polygon, and $\lambda$ is the smoothness penalty term. We train the model (see Fig. 4) end-to-end using the loss defined above. We weight the loss of each class inversely proportional to the label frequencies in the training set. See more details in Appendix B3.

Results

We evaluate our model against state-of-the-art models and detail the results in Table 2, where we evaluate runtime on a single Titan X (Pascal) GPU. We provide a visual comparison in Fig. 7. Our model surpasses state of the art for class-averaged IoU. In particular, the simplicity of our network architecture is highlighted in Table 3. While Polygon-RNN++ was unable to propagate gradients through IoU scores, it uses IoU as a reward to an additional reinforcement learning model, which adds additional complexities to the overall architecture. It also uses additional graph neural network to upsample and finetune the polygons. Due to the differentiable rasterization loss, our model uses a single CNN-based polygon generator. In comparison to Polygon-RNN++, our model achieves a 100x speed-up with a quarter of the total model parameters.

5 Conclusion

We propose the DDSL as a differentiable simplex layer for neural networks. We present a unifying framework for differentiable rasterization of arbitrary geometrical signals represented on a simplicial complex. We further show two geometric applications of this method: we can effectively propagate gradients across the DDSL for shape optimization, and we can utilize the DDSL to construct a differentiable rasterization loss that allows for a simple, yet effective, polygon generating network that surpasses state of the art in segmentation IoU as well as runtime and parameter efficiency.

6 Acknowledgements

We would like to thank Thomas Funkhouser and Avneesh Sud for helpful discussions. We appreciate help from Ling Huan for providing code and data for benchmarking our results against PolygonRNN++. This work is supported by a TUM-IAS Rudolf Mößbauer Fellowship and the ERC Starting Grant Scan2CAD (804724).

A Mathematical Derivations

A1 NUFT Derivative Derivation

Proof of Lemma 3.1.

Using Jacobi’s formula and chain rule,

[TABLE]

where $\tilde{A}$ is $adj(\hat{B}_{n}^{j})$ and $\tilde{{D}}$ is $\frac{\partial\hat{B}_{n}^{j}}{\partial\bm{x}_{p}}$ . Since $\hat{B}_{n}^{j}$ is symmetric, its adjunctive and derivative with respect to $\bm{x}_{p}$ are also symmetric. The elements on the diagonal and the first row and column of $\tilde{D}$ are zero, since the elements in the same positions in $\hat{B}_{n}^{j}$ are constant. The elements not in the $(p+1)$ th row or the $(p+1)$ th column of $\tilde{D}$ are also zero, since the elements in these positions in $\hat{B}_{n}^{j}$ do not depend on $\bm{x}_{p}$ . Thus,

[TABLE]

Each nonzero element of $\tilde{D}$ is computed as follows:

[TABLE]

It follows that the double summation term in Eqn. 21 simplifies to

[TABLE]

For clarity and ease of implementation, we modify the indexing in Eqn. 25 and the derivative of the content distortion factor is finally

[TABLE]

∎

Proof of Lemma 3.2.

By the sum rule,

[TABLE]

We examine two cases, when $t=p$ and when $t\neq{p}$ . For $t=p$ ,

[TABLE]

For $t\neq{p}$ ,

[TABLE]

Thus,

[TABLE]

∎

Derivation of Eqn. 14.

Using the product rule,

[TABLE]

We obtain Eqn. 14 by substituting Eqns. 11 and 13 into Eqn. 44. ∎

A2 Control Points

We use linear blend skinning to control mesh deformation using control points. The new position of a point $\bm{v}^{\prime}$ on the shape is computed as the weighted sum of handle transformations applied to its rest position $\bm{v}$ :

[TABLE]

Where $\bm{T}_{j}$ is the transformation matrix for the $j$ -th control point, $w_{j}(\bm{v})$ is the normalized weight on vertex $\bm{v}$ corresponding to control point $j$ . The transformation is represented in homogeneous coordinates, hence the extra dimension.

Consider control points with 3 degrees of freedom: $(t_{x},t_{y},\theta)$ where $t_{x}$ and $t_{y}$ represent translations in $x$ and $y$ and $\theta$ represents rotation around that control point. Hence we have

[TABLE]

Where $\tilde{\theta_{j}}$ is the original orientation of the control points. It does not matter since we will be taking the derivatives with respect to $\theta$ , and $\tilde{\theta_{j}}$ terms will disappear. The jacobian of $\bm{v}$ with respect to the three degrees of freedom is:

[TABLE]

B Network Architecture and Training Details

In this section, we detail all the network architectures and training routines for the reader’s reference.

B1 MNIST

We use a standard LeNet-5 architecture with 3 convolutional layers and 2 fully connected layers.

Network Architecture

The input is a 28x28 pixel image, which is normalized according to the mean and standard deviation of the entire dataset. The network architecture is as follows:

Conv(1, 10, 5, 1) + MaxPool(2) + ReLU $\rightarrow$ Conv(10, 20, 5, 1) + Dropout + MaxPool(2) + ReLU $\rightarrow$ FC(320, 250) + ReLU $\rightarrow$ Dropout $\rightarrow$ FC(250, 10)

Total number of parameters: 88,040

Training Details

We train the neural network with a batch size of 64 and an initial learning rate of $1\times 10^{-2}$ with a decay of $0.5$ per 10 epochs. We use the Stochastic Gradient Descent optimizer with a momentum of $0.5$ and a cross entropy loss.

B2 Airfoil

We use ResNet-50 [13] followed by three fully connected layers to predict the lift-drag ratio on the airfoil.

Network Architecture

The input is a 224x224 pixel image of the airfoil. For each piece of data, we append the Reynolds number and angle of attack after ResNet-50 and before the fully connected layers. The network architecture is as follows:

ResNet-50(1000) + BN + ReLU $\rightarrow$ append Reynolds number and angle of attack $\rightarrow$ FC(1002, 512) + BN + ReLU $\rightarrow$ FC(512, 64) + BN + ReLU $\rightarrow$ FC(64, 32) + BN

Total number of parameters: 26,100,345

Training Details

We train the neural network with a batch size of 240 and an initial learning rate of $1\times 10^{-2}$ with a decay of $1\times 10^{-1}$ per 20 epochs. We use the Adam optimizer and a mean squared error loss.

B3 Polygon Image Segmentation

We present a novel polygon decoder architecture that is paired with a standard pre-trained ResNet50 as input.

Network Architecture

The model architecture is detailed in Fig. 4. All ground-truth polygons are normalized to the range [0,1) corresponding to the relative positions within the bounding boxes. Using this network architecture, we first predict the three $(x,y)$ coordinates associated with the base triangle. Then, we progressively predict the offsets of the vertices in the next polygon hierarchy (See Fig. 3). The resulting polygon is rasterized with the DDSL to compute the rasterization loss compared with the rasterized target. Smoothness loss can be directly computed based on the vertex positions and does not require rasterization.

Total number of parameters: 24,274,426

Training Details

We train the network end-to-end, with a batch size of 48, learning rate of $10^{-3}$ for 200 epochs. We use a smoothness penalty of $\lambda=1$ . We use the Adam optimizer.

C Additional Computational Efficiency Tests

In addition to the computational speed benchmarks in Fig. 5 highlighting the performance gain of analytic derivative computation over numerical derivatives, we perform additional tests for 2D and 3D computation speeds on more complex polygons and meshes to show the applicability of DDSL to 2D and 3D computer vision problems.

D 3D Geometric Applications

To showcase the generalizabilty of the DDSL to 3D domain, we demonstrate its application in two separate 3D tasks that utilze the differentiablity of the simplex rasterization layer.

D1 3D Rotational Pose Estimation

In Fig. 8, we use DDSL to create a differentiable volumetric loss comparing current and target shapes, the gradients of which can be backpropagated to the pose. More specifically, we parameterize the rotational pose as a quaternion $\bm{q}=a+b\hat{\bm{i}}+c\hat{\bm{j}}+d\hat{\bm{k}},\quad s.t.||\bm{q}||_{2}=1$ . The rasterization loss is defined as:

[TABLE]

where $D_{32}$ is the rasterization operator at resolution $32^{3}$ and $V_{tg}$ is the target mesh.

Although the volumetric rasterization loss is not a globally convex loss for pose alignment, with certain initialization of the target poss, the pose can be estimated by minimizing the DDSL rasterization loss.

D2 Single Image Mesh Estimation

In Fig. 9, we evaluate our method in the context of 3D deep learning. Our model consists of an image encoder from ResNet18, spherical convolutions [17] for generating a distortion map for a spherical mesh, and a loss function which is a weighted sum of DDSL rasterization loss (at $32^{3}$ resolution), Chamfer loss from point samples, Laplacian regularization loss, and Edge length regularization loss. We train on the airplane category in ShapeNet dataset, with (w/) and without (w/o) DDSL loss. We evaluate using accuracy, completeness, and chamfer distance metrics (see Tab. 7).

Since surface based Chamfer distance does not signal the network to produce consistently oriented surfaces and does not consistently enclose volume, it leads to incorrectly oriented surfaces. DDSL loss effective regularizes surface orientation based on the volume enclosed according to the surface orientations, and improves overall results.

E Additional 3D Visualizations

We provide visualizations for rasterizing 3D shapes, rasterizing the enclosed volume as well as the surface mesh.

Bibliography49

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] David Acuna, Huan Ling, Amlan Kar, and Sanja Fidler. Efficient interactive annotation of segmentation datasets with polygon-rnn++. ar Xiv preprint ar Xiv:1803.09693 , 2018.
2[2] Davide Boscaini, Jonathan Masci, Emanuele Rodolà, and Michael Bronstein. Learning shape correspondence with anisotropic convolutional neural networks. In Advances in Neural Information Processing Systems , pages 3189–3197, 2016.
3[3] Michael M Bronstein, Joan Bruna, Yann Le Cun, Arthur Szlam, and Pierre Vandergheynst. Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine , 34(4):18–42, 2017.
4[4] Lluis Castrejon, Kaustav Kundu, Raquel Urtasun, and Sanja Fidler. Annotating object instances with a polygon-rnn. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 5230–5238, 2017.
5[5] Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 3213–3223, 2016.
6[6] Angela Dai and Matthias Nießner. 3dmv: Joint 3d-multi-view prediction for 3d semantic scene segmentation. In Proceedings of the European Conference on Computer Vision (ECCV) , 2018.
7[7] Angela Dai, Charles Ruizhongtai Qi, and Matthias Nießner. Shape completion using 3d-encoder-predictor cnns and shape synthesis. In Proc. Computer Vision and Pattern Recognition (CVPR), IEEE , 2017.
8[8] Angela Dai, Daniel Ritchie, Martin Bokeloh, Scott Reed, Jürgen Sturm, and Matthias Nießner. Scancomplete: Large-scale scene completion and semantic segmentation for 3d scans. In Proc. Computer Vision and Pattern Recognition (CVPR), IEEE , 2018.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Code & Models

Videos

DDSL: Deep Differentiable Simplex Layer for Learning Geometric Signals

Abstract

1 Introduction

2 Related Work

Geometric Representations for Deep Learning

Differentiable Rasterization in Deep Learning

Shape Optimization

Image Segmentation with Polygon Masks

3 Method

3.1 DDSL Overview

3.2 Mathematical Description

Definition 3.1** (jjj-simplex).**

Definition 3.2** (jjj-simplex mesh).**

Example 3.1** (Examples of simplices and simplex meshes).**

Definition 3.3** (Functions over a jjj-simplex element and a jjj-simplex mesh).**

Proposition 3.1** (Forward pass).**

Corollary 3.1** (AuxNode).**

Proof.

Lemma 3.1** (Derivative of the content distortion factor).**

Lemma 3.2** (Derivative of the summation term).**

Proposition 3.2** (Backward pass).**

3.3 Deep Learning Architectures and Pipelines

4 Experiments

4.1 Performance Benchmarking

Experiment Setup

Analysis of complexity

Results

4.2 Shape Optimization

General Experiment Setup

MNIST

Airfoils

Results

4.3 Segmentation Mask Generation

Experiment Setup

Training

Results

5 Conclusion

6 Acknowledgements

A Mathematical Derivations

A1 NUFT Derivative Derivation

Proof of Lemma 3.1.

Proof of Lemma 3.2.

Derivation of Eqn. 14.

A2 Control Points

B Network Architecture and Training Details

B1 MNIST

Network Architecture

Training Details

B2 Airfoil

Network Architecture

Training Details

B3 Polygon Image Segmentation

Network Architecture

Training Details

C Additional Computational Efficiency Tests

D 3D Geometric Applications

D1 3D Rotational Pose Estimation

D2 Single Image Mesh Estimation

E Additional 3D Visualizations

Definition 3.1 ( $j$ -simplex).

Definition 3.2 ( $j$ -simplex mesh).

Example 3.1 (Examples of simplices and simplex meshes).

Definition 3.3 (Functions over a $j$ -simplex element and a $j$ -simplex mesh).

Proposition 3.1 (Forward pass).

Corollary 3.1 (AuxNode).

Lemma 3.1 (Derivative of the content distortion factor).

Lemma 3.2 (Derivative of the summation term).

Proposition 3.2 (Backward pass).