pFedBBN: A Personalized Federated Test-Time Adaptation with Balanced Batch Normalization for Class-Imbalanced Data

Md Akil Raihan Iftee; Syed Md. Ahnaf Hasan; Mir Sazzat Hossain; Rakibul Hasan Rajib; Amin Ahsan Ali; AKM Mahbubur Rahman; Sajib Mistry; Monowar Bhuyan

arXiv:2511.18066·cs.LG·November 25, 2025

pFedBBN: A Personalized Federated Test-Time Adaptation with Balanced Batch Normalization for Class-Imbalanced Data

Md Akil Raihan Iftee, Syed Md. Ahnaf Hasan, Mir Sazzat Hossain, Rakibul Hasan Rajib, Amin Ahsan Ali, AKM Mahbubur Rahman, Sajib Mistry, Monowar Bhuyan

PDF

Open Access 3 Reviews

TL;DR

pFedBBN introduces a federated test-time adaptation method using balanced batch normalization to improve performance on class-imbalanced data without requiring labeled data, enhancing robustness and minority-class accuracy.

Contribution

It proposes a novel personalized federated TTA framework with balanced batch normalization and class-aware aggregation, addressing domain shifts and class imbalance without data sharing.

Findings

01

Consistently improves robustness over state-of-the-art methods.

02

Enhances minority-class performance in federated settings.

03

Supports fully unsupervised local adaptation.

Abstract

Test-time adaptation (TTA) in federated learning (FL) is crucial for handling unseen data distributions across clients, particularly when faced with domain shifts and skewed class distributions. Class Imbalance (CI) remains a fundamental challenge in FL, where rare but critical classes are often severely underrepresented in individual client datasets. Although prior work has addressed CI during training through reliable aggregation and local class distribution alignment, these methods typically rely on access to labeled data or coordination among clients, and none address class unsupervised adaptation to dynamic domains or distribution shifts at inference time under federated CI constraints. Revealing the failure of state-of-the-art TTA in federated client adaptation in CI scenario, we propose pFedBBN,a personalized federated test-time adaptation framework that employs balanced batch…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

Originality: The paper introduces a novel perspective on federated test-time adaptation (FTTA) by explicitly tackling class imbalance, a rarely addressed challenge in this context. The proposed Balanced Batch Normalization (BBN) and class-wise adaptive normalization represent an original and conceptually elegant modification to standard BN, ensuring fair adaptation across majority and minority classes. Integrating this with a similarity-based personalized aggregation using BN statistics is a cre

Weaknesses

Limited empirical scope: The experiments are confined to CIFAR-10-C and CIFAR-100-C—synthetic corruption benchmarks that only approximate domain shifts. This leaves uncertainty about the framework’s robustness in real-world federated settings (e.g., medical imaging, speech, or sensor data). Evaluations on larger or more heterogeneous datasets would better validate generality. Dependence on pseudo-labels: The class-wise BN statistics rely on pseudo-labels generated by the teacher. Under severe c

Reviewer 02Rating 6Confidence 4

Strengths

1. First to address the combination of Class Imbalance and Federated Test-time Adaptation 2. Part of the proposed method, the balanced batch normalization, can be integrated with prior works for improved performance. 3. Experiment shows improved results compared to baseline methods.

Weaknesses

1. The work lacks an explanation of the concrete design of the baseline methods; thus, it is not clear how the proposed method is novel compared to prior methods. 2. The idea of managing class-wise statistics and dealing with batch normalization is similar to several more recent related works dealing with class imbalance problems [1,2,3,4,5], but the relevant works are not cited or discussed. 3. The algorithm pseudo-code is not provided. 4. The experiment is conducted on limited datasets of CIFA

Reviewer 03Rating 2Confidence 3

Strengths

The studied topic is interesting and important. The experiment results seem to be encouraging.

Weaknesses

1. The literature review is not comprehensive. The data heterogeneity, particularly as a form of data imbalance and/or long-tail, has been extensively studied by the community, e.g., [1] and [2]. 2. The presentation of the methodology section is very sloppy making it very hard to accurately understand the details. The experiemnt sections failed to disclose necessary details. Therefore, it's hard to assess the contribution and novelty. Please see the `Question` sections for details. [1] Shang,

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Privacy-Preserving Technologies in Data · Machine Learning in Healthcare