Federated Nested Learning: Collaborative Training of Self-Referential Memories for Test-Time Adaptation
Hong Chen, Pengcheng Wu, Yuanguo Lin, Peilin Zhao, Xiuze Zhou, Fan Lin, Han Yu

TL;DR
This paper introduces Federated Nested Learning (FedNL), a new framework that redefines federated learning as a nested optimization problem, enabling efficient test-time adaptation on non-IID data with constant memory.
Contribution
FedNL reformulates federated learning as a three-level nested optimization system and incorporates Titans-based linear attention for lightweight, zero-shot test-time adaptation.
Findings
FedNL achieves competitive short-context reasoning performance.
Enhances long-context retrieval and streaming Cross-Entropy.
Maintains constant inference memory during adaptation.
Abstract
We rethink Federated Learning (FL) from a nested learning perspective, framing the core challenge as how to collaboratively learn optimization rules, not just static models, to tackle Non-IID client data. To address this, we propose Federated Nested Learning (FedNL), a novel framework that reformulates FL as a three-level nested optimization system. FedNL embeds Titans-based linear attention into FL, enabling clients to perform lightweight, zero-shot test-time adaptation by treating a delta rule as an online gradient step. Experiments on Non-IID MMLU and long-context benchmarks show that FedNL achieves competitive performance in short-context reasoning, enhances the performance of long-context retrieval and streaming Cross-Entropy, and maintains constant inference memory.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
