What We Don't C: Manifold Disentanglement for Structured Discovery

Brian Rogers; Micah Bowles; Chris J. Lintott; Steve Croft; Oliver N. F. King; James Kostas Ray

arXiv:2511.09433·cs.AI·March 12, 2026

What We Don't C: Manifold Disentanglement for Structured Discovery

Brian Rogers, Micah Bowles, Chris J. Lintott, Steve Croft, Oliver N. F. King, James Kostas Ray

PDF

Open Access 3 Reviews

TL;DR

This paper presents What We Don't C, a method that disentangles latent representations by removing conditioned information, enabling better exploration of unrepresented factors in high-dimensional data.

Contribution

It introduces a novel latent flow matching technique for explicit disentanglement of latent subspaces by removing conditional information.

Findings

01

Enhances interpretability of latent representations.

02

Facilitates discovery of unmodeled data factors.

03

Provides a simple mechanism for analyzing and controlling generative models.

Abstract

Accessing information in learned representations is critical for annotation, discovery, and data filtering in disciplines where high-dimensional datasets are common. We introduce What We Don't C, a novel approach based on latent flow matching that disentangles latent subspaces by explicitly removing information included in conditional guidance, resulting in meaningful residual representations. This allows factors of variation which have not already been captured in conditioning to become more readily available. We show how guidance in the flow path necessarily represses the information from the guiding, conditioning variables. Our results highlight this approach as a simple yet powerful mechanism for analyzing, controlling, and repurposing latent representations, providing a pathway toward using generative models to explore what we don't capture, consider, or catalog.

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 3

Strengths

1. The idea of combining flow matching with variational autoencoders (VAEs) is interesting and has potential to inspire further exploration in disentangled representation learning. 2. The paper is well-structured.

Weaknesses

1. Since the method is built on top of VAEs and relies on the approximately Gaussian distribution of their latent space, its use is restricted to this specific class of generative models. 2. The paper lacks sufficient supporting evidence. In the experimental section, the author evaluates the method on synthetic 2D Gaussian data, CMNIST, and a real-world dataset. All three datasets are relatively simple, and other existing disentanglement methods are known to perform well on them—particularly o

Reviewer 02Rating 4Confidence 2

Strengths

- addresses an important problem in an interesting way (including allowing further disentanglement of pretrained models) - reasonable breadth of experiments, from simple controlled to complex real datasets, including intuitive results in figures 6 and 7 - intuitive results, especially in figures 6 and 7

Weaknesses

- no reproducibility statement or opensource code, which is especially important for less theoretical contributions like this - no (argument for the lack of) clear contextualization or comparison against existing disentanglement approaches - hard-to-follow theory presentation in sections 2 and 3; maybe I just lack the background, but I guess I'm not the only reader who would benefit from gentler, more precise guidance through it - unpolished writing

Reviewer 03Rating 6Confidence 2

Strengths

The paper has a clear and compelling motivation: instead of continuing to reinforce information we already understand in a dataset, it focuses on uncovering what remains after known factors are removed. This conceptual reframing is refreshing and feels genuinely useful, especially for exploratory scientific analysis. The authors present the idea in an intuitive way, and the progression of experiments (from synthetic data to real astrophysics imagery) helps build trust in the approach. The qualit

Weaknesses

The main limitation is that the evaluation remains largely qualitative, making it difficult to assess how well the method performs relative to established baselines in representation learning or disentanglement research. The paper would benefit from more systematic quantitative comparisons or metrics to support its claims. Some of the theoretical explanations around how information is preserved or removed during the latent flow process are also hard to follow and could use clearer intuition rath

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Domain Adaptation and Few-Shot Learning