Representation Invariance and Allocation: When Subgroup Balance Matters

Anissa Alloula; Charles Jones; Zuzanna Wakefield-Skorniewska; Francesco Quinzan; Bart{\l}omiej Papie\.z

arXiv:2512.09496·cs.LG·December 11, 2025

Representation Invariance and Allocation: When Subgroup Balance Matters

Anissa Alloula, Charles Jones, Zuzanna Wakefield-Skorniewska, Francesco Quinzan, Bart{\l}omiej Papie\.z

PDF

Open Access 3 Reviews

TL;DR

This paper investigates how subgroup representation in training data affects model performance, revealing that the importance of data balance depends on latent subgroup separation in pre-trained models, with implications for data collection.

Contribution

It introduces the latent separation hypothesis, formalizes it, and empirically validates how latent subgroup separation influences the need for balanced data in model fine-tuning.

Findings

01

Imbalanced data can sometimes improve subgroup performance.

02

Latent separation in pre-trained models predicts sensitivity to data imbalance.

03

Quantitative analysis of latent separation guides data balancing decisions.

Abstract

Unequal representation of demographic groups in training data poses challenges to model generalisation across populations. Standard practice assumes that balancing subgroup representation optimises performance. However, recent empirical results contradict this assumption: in some cases, imbalanced data distributions actually improve subgroup performance, while in others, subgroup performance remains unaffected by the absence of an entire subgroup during training. We conduct a systematic study of subgroup allocation across four vision and language models, varying training data composition to characterise the sensitivity of subgroup performance to data balance. We propose the latent separation hypothesis, which states that a partially fine-tuned model's dependence on subgroup representation is determined by the degree of separation between subgroups in the latent space of the pre-trained…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

1. The paper tackles an important problem in algorithmic fairness. 2. The authors derive a simple and intuitive TV upper bound.

Weaknesses

1. Though Theorem 5.1 provides some good intuition, it does not sufficiently characterize the empirical phenomenon studied in the paper. In particular, the authors should derive a bound on the slope of the per-subgroup balanced acc vs. subgroup proportion (Figure 2), as a function of the TV or some other terms. 2. The assumption that $P(Y)$ stays the same across the two datasets in L290 seems very strong. In particular, if we have $P(Y | A= 0) \neq P(Y | A= 1)$, it seems like changing the subg

Reviewer 02Rating 4Confidence 3

Strengths

The paper presents an attempt to explain when balancing subgroup data affects model performance. The main idea to link subgroup performance sensitivity to the similarity of their latent representations is clear. The experiments are consistent across datasets and the analysis is carefully done, though mostly within limited and controlled setups. The paper is clearly written and the results are easy to follow.

Weaknesses

1. Doesn’t account for subgroup difficulty. The paper assumes subgroup performance differences come mainly from representation separation, but it doesn’t consider that some subgroups might just be inherently harder to learn. For example, a subgroup with noisier labels or less distinctive features may naturally require more data. Without checking this, the correlation between latent separation and sensitivity could partly reflect difficulty, not just representation overlap. 2. Fixed fine-tuning

Reviewer 03Rating 4Confidence 4

Strengths

1. This is an important problem and one that I do not think has been addressed yet, to my knowledge. 1. Including both the task-specific and the foundation model experiments adds depth to the experimental results and supports interesting comparisons. 2. I thought the limitations section was insightful and well thought out. 3. Formal statements are relevant and appear to be correct though I have not checked all details in the appendix. 4. Related work is sufficient to the best of my knowledge.

Weaknesses

*1. Use of linear scaling laws.* The authors use linear fits to describe the relationship between subgroup allocation and group performance in figure 2 and figure 3. As the authors note, this is in deviation to past work (Rolf et al. 2021) -- additionally, this deviates from what one would expect from statistical learning theory. I understand that linear models are easier/less sensitive to fit. I disagree with the authors in their assessments that the trends in Figure 2 look linear. The linear f

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications