Lifting Architectural Constraints of Injective Flows

Peter Sorrenson; Felix Draxler; Armand Rousselot; Sander Hummerich,; Lea Zimmermann; Ullrich K\"othe

arXiv:2306.01843·cs.LG·June 28, 2024·1 cites

Lifting Architectural Constraints of Injective Flows

Peter Sorrenson, Felix Draxler, Armand Rousselot, Sander Hummerich,, Lea Zimmermann, Ullrich K\"othe

PDF

Open Access 2 Repos 1 Video 3 Reviews

TL;DR

This paper introduces an efficient method for training Injective Flows that jointly learn data manifolds and distributions, overcoming previous architectural and computational limitations, and demonstrating competitive results across various data types.

Contribution

It proposes a new stable maximum likelihood training objective for Injective Flows compatible with flexible architectures, enabling better manifold learning.

Findings

01

Efficient estimator for maximum likelihood compatible with free-form architectures

02

Stable training method prevents divergence when learning data manifolds

03

Competitive performance demonstrated on toy, tabular, and image datasets

Abstract

Normalizing Flows explicitly maximize a full-dimensional likelihood on the training data. However, real data is typically only supported on a lower-dimensional manifold leading the model to expend significant compute on modeling noise. Injective Flows fix this by jointly learning a manifold and the distribution on it. So far, they have been limited by restrictive architectures and/or high computational cost. We lift both constraints by a new efficient estimator for the maximum likelihood loss, compatible with free-form bottleneck architectures. We further show that naively learning both the data manifold and the distribution on it can lead to divergent solutions, and use this insight to motivate a stable maximum likelihood training objective. We perform extensive experiments on toy, tabular and image data, demonstrating the competitive performance of the resulting model.

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 8· accept, good paperConfidence 5

Strengths

I'll enumerate the strengths below for ease of reference in discussion. These are not listed in order of importance. 1. The paper is generally very well-written and comes up quite polished. I'll outline below: - The introduction is very clean. The motivation for the method is clearly laid out. - The paper is well-situated amongst the related work. - The background is easy to digest. - The path to the final model in Section 4 is laid out well. - The appendix is quite thorough

Weaknesses

I'll write weaknesses in a list as well. Again this list is not ordered in terms of importance. 1. In the end, this paper could be summarized as simply training autoencoders with a different training loss, with the loss motivated by previous work in injective flows. The novelty and significance of this particular choice of loss over other types of autoencoder losses is not completely clear for a couple of reasons: (i) Table 3 is not convincing, as the best results are still produced by other au

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

The presented techniques are original, with contributions on removing limitations from prior methods. The presented techniques are interesting and potentially valuable.

Weaknesses

The clarity should be significantly improved. For example, many important derivations should be moved to the main manuscript, and important assumptions should be highlighted.

Reviewer 03Rating 8· accept, good paperConfidence 5

Strengths

This paper has several strengths: 1. It is well written and easy to follow, and I think the authors did a good job of deciding which material to include in the main manuscript and which details to include in the appendix. 2. It is well motivated, as I agree with the authors that the current injective flow literature uses overly restrictive architectures and/or computationally intensive training procedures. 3. The simple observation that, when the encoder $f$ is the left inverse of the decoder

Weaknesses

5. In my view, the main weakness of the paper is the lack of ablations. As mentioned, the paper proposes 3 improvements over rectangular flows, and it is unclear how much each of these contributes to the empirical performance of the proposed method. I think table 1 provides a perfect test bed to carry out these ablations: results using the same architecture as rectangular flows should be added to the table, both (a) using the gradient estimator from eq 10, and (b) that from eq 16. Ideally, using

Code & Models

Repositories

Videos

Lifting Architectural Constraints of Injective Flows· slideslive

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Machine Learning in Healthcare · Anomaly Detection Techniques and Applications

MethodsNormalizing Flows