OvA-LP: A Simple and Efficient Framework for Federated Learning on Non-IID Data

Dongjin Park; Hasung Yeo; Joon-Woo Lee

arXiv:2511.05028·cs.LG·November 10, 2025

OvA-LP: A Simple and Efficient Framework for Federated Learning on Non-IID Data

Dongjin Park, Hasung Yeo, Joon-Woo Lee

PDF

Open Access 3 Reviews

TL;DR

OvA-LP is a novel federated fine-tuning framework that effectively suppresses client drift at its source, significantly improving robustness and efficiency in non-IID data scenarios.

Contribution

It introduces OvA-LP, the first framework explicitly designed to reduce drift during federated fine-tuning by decoupling logits and preserving feature geometry.

Findings

01

Retains 95.9% of IID accuracy on CIFAR-100 with 100 clients.

02

Outperforms state-of-the-art FFT baselines significantly under non-IID conditions.

03

Maintains robustness under label noise and reduces per-round computational cost.

Abstract

Federated fine-tuning (FFT) adapts foundation models to decentralized data but remains fragile under heterogeneous client distributions due to local drift, i.e., client-level update divergences that induce systematic bias and amplified variance in the global model. Existing aggregation and personalization methods largely correct drift post hoc, which proves brittle under extreme non-IID conditions. We introduce OvA-LP, a minimalist framework that is, to our knowledge, the first explicitly designed to suppress drift at its source within the PEFT-based FFT paradigm. OvA-LP combines linear probing on a frozen encoder with a one-vs-all head and a simple two-stage procedure, preserving pretrained feature geometry and decoupling logits to prevent the mechanisms that amplify drift. On CIFAR-100 with 100 clients, averaged over shard-1, shard-2, and Bernoulli-Dirichlet partitions, OvA-LP retains…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 4

Strengths

1. **Clear Motivation:** The design is well-grounded in a bias-variance decomposition. Each component is clearly justified as targeting a specific source of drift: linear probing limits feature-skew bias , OvA heads eliminate label-skew bias , and the two-stage training controls variance. 2. **Methodological Simplicity**: The paper presents a "minimalist" framework that combines three relatively simple components: linear probing on a frozen encoder, one-vs-all (OvA) heads, and a two-stage train

Weaknesses

1. **Unfair Baseline Comparison**: This is the most critical issue. The paper compares OvA-LP- which only trains a linear head on a frozen encoder - against SOTA baselines like PFPT and FFT-MoE. These baselines are Parameter-Efficient Fine-Tuning (PEFT) methods that adapt the model (e.g., via prompt-tuning or MoE layers). The "local drift" that the paper claims to solve is a phenomenon that arises precisely because the baselines are fine-tuning the shared encoder on heterogeneous local data. OvA

Reviewer 02Rating 4Confidence 3

Strengths

**1. Source-Level Philosophy:** The paper introduces a “source-level” perspective on preventing client drift. Instead of adjusting aggregation or adding personalization after the fact, OvA-LP stops drift before it begins. This proactive approach is supported by a clear bias–variance framing that connects theory with observed results. **2. Strong Theoretical Motivation:** OvA-LP is grounded in a bias–variance decomposition of federated gradients, showing how each component addresses a specific c

Weaknesses

**1. Experimental Setup Clarification (Major):** It is unclear whether the comparisons between OvA-LP and the baselines (FFT-MoE, PFPT) are conducted on the same dataset and under identical settings. Although the paper mentions that baseline methods are reproduced using their original architectures and training protocols, Table 7 shows different datasets and client counts for OvA-LP compared to the baselines. Moreover, Table 8 reports an active client ratio of 10% for PFPT, while OvA-LP uses 100

Reviewer 03Rating 4Confidence 3

Strengths

- **Strong Empirical Performance**. - **High Efficiency**. - **Simplicity and Modularity**.

Weaknesses

1. **Incremental Contribution:** The core components (linear probing, OvA heads, two-stage training) are all established techniques. The paper fails to demonstrate novel synergistic mechanisms beyond their combination. 2. **Superficial Theoretical Analysis:** The bias-variance decomposition remains qualitative. Lacks formal proofs or bounds to substantiate claims about bias reduction or variance control. 3. **Limited Experimental Validation:** - Scope is narrow (only vision tasks, ViT encode

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Privacy-Preserving Technologies in Data · Machine Learning and Data Classification