# The Capacity of Private Information Retrieval from Byzantine and   Colluding Databases

**Authors:** Karim Banawan, Sennur Ulukus

arXiv: 1706.01442 · 2017-06-06

## TL;DR

This paper derives the exact capacity for private information retrieval from multiple databases that may be outdated or malicious, accounting for collusion and errors, and extends robust PIR schemes to handle Byzantine errors.

## Contribution

It provides the first information-theoretic capacity characterization for PIR with Byzantine and colluding databases, extending robust PIR schemes to correct Byzantine errors.

## Key findings

- Capacity is reduced by a factor of (N-2B)/N due to Byzantine databases.
- Exact capacity formula derived for the case 2B+T < N.
- The scheme extends robust PIR to correct Byzantine errors.

## Abstract

We consider the problem of single-round private information retrieval (PIR) from $N$ replicated databases. We consider the case when $B$ databases are outdated (unsynchronized), or even worse, adversarial (Byzantine), and therefore, can return incorrect answers. In the PIR problem with Byzantine databases (BPIR), a user wishes to retrieve a specific message from a set of $M$ messages with zero-error, irrespective of the actions performed by the Byzantine databases. We consider the $T$-privacy constraint in this paper, where any $T$ databases can collude, and exchange the queries submitted by the user. We derive the information-theoretic capacity of this problem, which is the maximum number of \emph{correct symbols} that can be retrieved privately (under the $T$-privacy constraint) for every symbol of the downloaded data. We determine the exact BPIR capacity to be $C=\frac{N-2B}{N}\cdot\frac{1-\frac{T}{N-2B}}{1-(\frac{T}{N-2B})^M}$, if $2B+T < N$. This capacity expression shows that the effect of Byzantine databases on the retrieval rate is equivalent to removing $2B$ databases from the system, with a penalty factor of $\frac{N-2B}{N}$, which signifies that even though the number of databases needed for PIR is effectively $N-2B$, the user still needs to access the entire $N$ databases. The result shows that for the unsynchronized PIR problem, if the user does not have any knowledge about the fraction of the messages that are mis-synchronized, the single-round capacity is the same as the BPIR capacity. Our achievable scheme extends the optimal achievable scheme for the robust PIR (RPIR) problem to correct the \emph{errors} introduced by the Byzantine databases as opposed to \emph{erasures} in the RPIR problem. Our converse proof uses the idea of the cut-set bound in the network coding problem against adversarial nodes.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1706.01442/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1706.01442/full.md

## References

33 references — full list in the complete paper: https://tomesphere.com/paper/1706.01442/full.md

---
Source: https://tomesphere.com/paper/1706.01442