DOMBA: Double Model Balancing for Access-Controlled Language Models via   Minimum-Bounded Aggregation

Tom Segal; Asaf Shabtai; Yuval Elovici

arXiv:2408.11121·cs.LG·February 11, 2025

DOMBA: Double Model Balancing for Access-Controlled Language Models via Minimum-Bounded Aggregation

Tom Segal, Asaf Shabtai, Yuval Elovici

PDF

Open Access 1 Repo 1 Video

TL;DR

DOMBA introduces a novel double model balancing technique that combines models trained on different access levels to ensure data security while maintaining high utility in access-controlled language models.

Contribution

It proposes a new aggregation method, the min-bounded average, to effectively balance utility and security in access-controlled LLMs.

Findings

01

DOMBA effectively safeguards sensitive information.

02

It achieves utility comparable to non-secure models.

03

The method is supported by rigorous mathematical analysis.

Abstract

The utility of large language models (LLMs) depends heavily on the quality and quantity of their training data. Many organizations possess large data corpora that could be leveraged to train or fine-tune LLMs tailored to their specific needs. However, these datasets often come with access restrictions that are based on user privileges and enforced by access control mechanisms. Training LLMs on such datasets could result in exposure of sensitive information to unauthorized users. A straightforward approach for preventing such exposure is to train a separate model for each access level. This, however, may result in low utility models due to the limited amount of training data per model compared to the amount in the entire organizational corpus. Another approach is to train a single LLM on all the data while limiting the exposure of unauthorized information. However, current…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ppo1/domba
pytorchOfficial

Videos

DOMBA: Double Model Balancing for Access-Controlled Language Models via Minimum-Bounded Aggregation· underline

Taxonomy

TopicsNatural Language Processing Techniques