Domain Expansion: A Latent Space Construction Framework for Multi-Task Learning
Chi-Yao Huang, Khoa Vo, Aayush Atul Verma, Duo Lu, Yezhou Yang

TL;DR
The paper introduces Domain Expansion, a framework that constructs a structured, orthogonal latent space to improve multi-task learning by preventing conflicting gradients and enabling interpretable representations.
Contribution
It proposes a novel orthogonal pooling mechanism to create a mutually orthogonal latent space for multiple objectives, addressing latent representation collapse.
Findings
Prevents latent representation collapse in multi-task learning.
Produces an interpretable and manipulable latent space.
Achieves superior performance on diverse benchmarks.
Abstract
Training a single network with multiple objectives often leads to conflicting gradients that degrade shared representations, forcing them into a compromised state that is suboptimal for any single task--a problem we term latent representation collapse. We introduce Domain Expansion, a framework that prevents these conflicts by restructuring the latent space itself. Our framework uses a novel orthogonal pooling mechanism to construct a latent space where each objective is assigned to a mutually orthogonal subspace. We validate our approach across diverse benchmarks--including ShapeNet, MPIIGaze, and Rotated MNIST--on challenging multi-objective problems combining classification with pose and gaze estimation. Our experiments demonstrate that this structure not only prevents collapse but also yields an explicit, interpretable, and compositional latent space where concepts can be directly…
Peer Reviews
Decision·ICLR 2026 Poster
1. **Novel and intuitive framing:** The notion of latent representation collapse as a geometric phenomenon in shared latent spaces is well-motivated and connects neatly to known issues in MTL such as negative transfer and conflicting gradients. 2. **Architectural elegance:** Orthogonal pooling is conceptually simple yet effective. It shifts focus from gradient-level interventions (as in PCGrad or Nash-MTL) to a proactive representation-level solution. 3. **Compositional algebra:** The inclusio
# Major Concerns 1. **Empirical scope is narrow.** Experiments focus exclusively on relatively small, controlled datasets (ShapeNet, MPIIGaze, Rotated MNIST). These are synthetic or low-dimensional settings. The paper’s claims of scalability and generality (e.g., toward fairness or multimodal learning in Sec. 6) are not yet substantiated. 2. **Unclear stability and computational cost.** The method requires per-epoch covariance estimation and eigendecomposition of large latent spaces (2048-D
1. The proposed approach tackles the problem of multi-objective optimization from a different perspective than prior work. 2. The proposed orthogonal domain is intuitive and well-motivated. The authors illustrate the general problem it should solve and demonstrate through analyses that it is indeed an issue in practice. 3. The domain expansion approach is principled and the authors describe its operations and properties. 4. The authors show that their method outperforms baselines
1. Training inefficiency: Eigenvectors need to be calculated and features need to be projected at every epoch. In addition, the Hungarian algorithm needs to be used to align eigenvectors across training epochs. It would be useful to provide some analysis regarding the training times of the proposed method vs the baselines to understand to what extent this is a limitation in practice. 2. Weak base model: The method is only applied to a fairly old model (ResNet-50). It would be helpful to app
* **S1. Clarity and Presentation:** The paper is well-written and easy to follow. The authors do a great job of motivating the problem with the clear concept of "latent representation collapse." The flow from the problem statement to the proposed method is logical and intuitive. The figures are highly effective at illustrating the core idea. * **2. Originality and Elegance of the Method:** The central idea of harnessing the principal eigenvectors of the latent space's covariance matrix to form a
* **W1. Scope of Empirical Evaluation:** The experimental validation, while thorough on the chosen dataset, could be broadened to better establish the generalizability of the method. * **Architectural Diversity**: The experiments are conducted using only a ResNet-50 encoder. It would be beneficial to understand if the method is equally effective with other modern architectures, such as Vision Transformers, which may have different latent space geometries. * **Dataset Diversity**: The pri
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaze Tracking and Assistive Technology · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
