Can Textual Gradient Work in Federated Learning?
Minghui Chen, Ruinan Jin, Wenlong Deng, Yuanyuan Chen, Zhi Huang, Han, Yu, Xiaoxiao Li

TL;DR
This paper introduces FedTextGrad, a novel federated learning paradigm that utilizes textual gradients for prompt optimization, expanding FL applicability to non-numerical data and addressing key challenges in aggregation and information retention.
Contribution
We propose FedTextGrad, a new FL framework for textual gradients, and provide experimental insights, challenges, and improvements based on information density principles.
Findings
Proper tuning of local steps is crucial for FL training.
Retaining essential information during prompt aggregation is challenging.
Leveraging the Uniform Information Density principle improves prompt summarization.
Abstract
Recent studies highlight the promise of LLM-based prompt optimization, especially with TextGrad, which automates differentiation'' via texts and backpropagates textual feedback. This approach facilitates training in various real-world applications that do not support numerical gradient propagation or loss calculation. In this paper, we systematically explore the potential and challenges of incorporating textual gradient into Federated Learning (FL). Our contributions are fourfold. Firstly, we introduce a novel FL paradigm, Federated Textual Gradient (FedTextGrad), that allows clients to upload locally optimized prompts derived from textual gradients, while the server aggregates the received prompts. Unlike traditional FL frameworks, which are designed for numerical aggregation, FedTextGrad is specifically tailored for handling textual data, expanding the applicability of FL to a broader…
Peer Reviews
Decision·ICLR 2025 Poster
1) The introduction of FedTextGrad as a framework to incorporate textual gradients into FL represents a novel approach, particularly valuable in settings where numerical gradients are unavailable, such as black-box LLM applications. 2) The paper addresses a core challenge in aggregating textual data—preserving essential context without exceeding token limits—by proposing a UID-based summarization approach. This innovative method helps maintain critical information balance across prompts, solvin
1) The paper identifies privacy risks with textual gradients but does not provide or test concrete methods to protect sensitive information, which is crucial for FL applications in privacy-sensitive domains. 2) The experiments are primarily conducted on a few large LLMs, restricting insights into how FedTextGrad performs across diverse architectures, particularly smaller models in resource-constrained settings. 3) The UID-based summarization method helps with prompt aggregation but has limitatio
- Presentation of a novel framework for handling prompt-based learning in a federated learning context. - Numerous insights on learning in a federated learning setting without numerical gradients.
- lack of comparisons w.r.t. previous works on federated learning
This paper proposes a concept called FedTextGrad for optimizing large language models by integrating text gradients. This method utilizes text feedback for model optimization in a federated environment, extending the application of federated learning to areas where numerical gradients are impractical or infeasible. This paper identifies and addresses key challenges in joint text gradient aggregation, such as maintaining basic information in distributed updates and managing prompt sizes to accomm
Although the experiments in the paper are comprehensive, they mainly focus on inference tasks, which may not fully demonstrate the broad applicability of FedTextGrad in various fields. The discussion on the limitations of the FedTextGrad method is somewhat insufficient. Although this article briefly discusses challenges such as prompt length management and information retention, it does not delve into potential limitations or scalability issues that may arise when deployed in larger, more hetero
Code & Models
Videos
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning
MethodsTextGrad
