Effective and Efficient Federated Tree Learning on Hybrid Data
Qinbin Li, Chulin Xie, Xiaojun Xu, Xiaoyuan Liu, Ce Zhang, Bo Li,, Bingsheng He, Dawn Song

TL;DR
This paper introduces HybridTree, a federated learning method for hybrid data that efficiently builds decision trees across parties with different features and samples, achieving high accuracy with low communication overhead.
Contribution
HybridTree is the first federated tree learning approach designed for hybrid data, incorporating theoretical insights and a layer-level training solution to reduce communication.
Findings
Achieves comparable accuracy to centralized models.
Up to 8 times faster than baseline methods.
Requires minimal communication overhead.
Abstract
Federated learning has emerged as a promising distributed learning paradigm that facilitates collaborative learning among multiple parties without transferring raw data. However, most existing federated learning studies focus on either horizontal or vertical data settings, where the data of different parties are assumed to be from the same feature or sample space. In practice, a common scenario is the hybrid data setting, where data from different parties may differ both in the features and samples. To address this, we propose HybridTree, a novel federated learning approach that enables federated tree learning on hybrid data. We observe the existence of consistent split rules in trees. With the help of these split rules, we theoretically show that the knowledge of parties can be incorporated into the lower layers of a tree. Based on our theoretical analysis, we propose a layer-level…
Peer Reviews
Decision·ICLR 2024 poster
1. The main innovation of this work lies in the development of a tree transformation strategy that can reorder split features to accommodate a federated learning environment. This is particularly relevant for scenarios where data privacy and distribution are concerns. 2. By introducing a new layer-level training algorithm, HybridTree, they address the integration of knowledge from multiple participants (referred to as "guests") in the federated model without compromising on data privacy. 3. Th
Even though the proposed method can handle hybrid features, the features have to be tabular data. This might not be the constraint of this paper but rather the limitation of the tree-based methods. However, maybe the authors can consider the scenarios where clients have multi-modal data, where the data modalities are hybrid across clients.
1. This work targets at a hybrid FL setting, where vertical FL setting integrates with horizontal setting. The setting has practical applications in many industrial areas but are less studied in research. 2. The observation of the existence of meta-rule is novel and leads to the development of a communication efficiency tree-based FL algorithm.
1. The paper is not very well-presented and is hard to follow. First of all, it is unclear in the hybrid setting considered, what are the relative relations of the guest parties? In the introduction, it appears that they share the same feature space but have different sample IDs, however, in 3.1 they appear to have different dimensions and unclear alignment. It is suggested that the paper properly define the problem setting. A figure on how data is partitioned by different parties would also he
1. The problem studied in this paper is interesting and important. This work proposes federated tree models on hybrid data, which expands the scope of current FL frameworks and makes an important contribution to the FL community. 2. This paper is original and technically sound. The main claims regarding the proposed setting are well supported in the methodology and experimental parts.
1. The paper is not easy to follow. 2. It is better to provide more analysis or explanation on the training process in section 4.1 to let readers well understand how HybridTree handles hybrid data and makes their contribution to the improvement. 3. Although there is a relatively thorough literature review in the related work part, I prefer to see a discussion on the relation of this work, especially the specific methods. 4. Some minor errors, see below.
Videos
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Data Mining Algorithms and Applications
MethodsFocus
