DOMBA: Double Model Balancing for Access-Controlled Language Models via Minimum-Bounded Aggregation
Tom Segal, Asaf Shabtai, Yuval Elovici

TL;DR
DOMBA introduces a novel double model balancing technique that combines models trained on different access levels to ensure data security while maintaining high utility in access-controlled language models.
Contribution
It proposes a new aggregation method, the min-bounded average, to effectively balance utility and security in access-controlled LLMs.
Findings
DOMBA effectively safeguards sensitive information.
It achieves utility comparable to non-secure models.
The method is supported by rigorous mathematical analysis.
Abstract
The utility of large language models (LLMs) depends heavily on the quality and quantity of their training data. Many organizations possess large data corpora that could be leveraged to train or fine-tune LLMs tailored to their specific needs. However, these datasets often come with access restrictions that are based on user privileges and enforced by access control mechanisms. Training LLMs on such datasets could result in exposure of sensitive information to unauthorized users. A straightforward approach for preventing such exposure is to train a separate model for each access level. This, however, may result in low utility models due to the limited amount of training data per model compared to the amount in the entire organizational corpus. Another approach is to train a single LLM on all the data while limiting the exposure of unauthorized information. However, current…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques
