Building a Sybil-Resilient Digital Community Utilizing Trust-Graph   Connectivity

Ouri Poupko; Gal Shahaf; Ehud Shapiro; and Nimrod Talmon

arXiv:1901.00752·cs.SI·December 11, 2023

Building a Sybil-Resilient Digital Community Utilizing Trust-Graph Connectivity

Ouri Poupko, Gal Shahaf, Ehud Shapiro, and Nimrod Talmon

PDF

TL;DR

This paper proposes trust-graph connectivity methods to prevent sybil identities from infiltrating digital communities, enabling indefinite growth while maintaining security against malicious actors.

Contribution

It introduces two novel tools, graph conductance and vertex expansion, to limit sybil penetration and ensure secure, indefinite community growth.

Findings

01

Trust-graph connectivity can prevent sybil infiltration

02

High conductance or vertex expansion ensures safe community growth

03

Maintaining less than one-third byzantines enables Byzantine Agreement

Abstract

Preventing fake or duplicate digital identities (aka sybils) from joining a digital community may be crucial to its survival, especially if it utilizes a consensus protocol among its members or employs democratic governance, where sybils can undermine consensus, tilt decisions, or even take over. Here, we explore the use of a trust-graph of identities, with edges representing trust among identity owners, to allow a community to grow indefinitely without increasing its sybil penetration. Since identities are admitted to the digital community based on their trust by existing digital community members, corrupt identities, which may trust sybils, also pose a threat to the digital community. Sybils and their corrupt perpetrators are together referred to as byzantines, and the overarching aim is to limit their penetration into a digital community. We propose two alternative tools to achieve…

Equations97

de g (x) := ∣ {y \in V ∣ (x, y) \in E} ∣

de g (x) := ∣ {y \in V ∣ (x, y) \in E} ∣

v o l (A) := x \in A \sum de g (x)

v o l (A) := x \in A \sum de g (x)

v o l_{A} (B) := x \in B \sum de g_{A} (x)

v o l_{A} (B) := x \in B \sum de g_{A} (x)

e (A, B) = ∣ {(x, y) \in E ∣ x \in A, y \in B} ∣

e (A, B) = ∣ {(x, y) \in E ∣ x \in A, y \in B} ∣

Φ_{e} (G) = \emptyset \neq = A \subset V min \frac{e ( A , A ^{c} )}{min { v o l ( A ) , v o l ( A ^{c} )}}

Φ_{e} (G) = \emptyset \neq = A \subset V min \frac{e ( A , A ^{c} )}{min { v o l ( A ) , v o l ( A ^{c} )}}

\partial_{v} (A, B) := # {x \in A ∣\exists y \in B s . t . (x, y) \in E}

\partial_{v} (A, B) := # {x \in A ∣\exists y \in B s . t . (x, y) \in E}

Φ_{v} (G) := 0 < ∣ A ∣ \leq \frac{∣ V ∣}{2} min \frac{\partial _{v} ( A , A ^{c} )}{∣ A ∣}

Φ_{v} (G) := 0 < ∣ A ∣ \leq \frac{∣ V ∣}{2} min \frac{\partial _{v} ( A , A ^{c} )}{∣ A ∣}

σ (G) = \frac{∣ A \cap S ∣}{∣ A ∣}

σ (G) = \frac{∣ A \cap S ∣}{∣ A ∣}

β (G) = \frac{∣ A \cap B ∣}{∣ A ∣}

β (G) = \frac{∣ A \cap B ∣}{∣ A ∣}

\frac{∣ A \cap B ∣}{∣ A ∣} \leq β iff \frac{∣ A \cap B ∣}{∣ A \cap H ∣} \leq \frac{β}{1 - β}

\frac{∣ A \cap B ∣}{∣ A ∣} \leq β iff \frac{∣ A \cap B ∣}{∣ A \cap H ∣} \leq \frac{β}{1 - β}

α \cdot d \leq d e g_{A_{i}} (v) \leq d for all v \in A_{i}, i \in N

α \cdot d \leq d e g_{A_{i}} (v) \leq d for all v \in A_{i}, i \in N

β (G_{1}) \leq β

β (G_{1}) \leq β

\frac{e ( A _{i} \cap H , A _{i} \cap B )}{v o l _{A_{i}} ( A _{i} \cap H )} \leq γ_{e}

\frac{e ( A _{i} \cap H , A _{i} \cap B )}{v o l _{A_{i}} ( A _{i} \cap H )} \leq γ_{e}

∣ A_{i} ∖ A_{i - 1} ∣ \leq δ ∣ A_{i - 1} ∣

∣ A_{i} ∖ A_{i - 1} ∣ \leq δ ∣ A_{i - 1} ∣

Φ_{e} (G ∣_{A_{i}}) > \frac{γ _{e}}{α} \cdot (\frac{1 - β}{β})

Φ_{e} (G ∣_{A_{i}}) > \frac{γ _{e}}{α} \cdot (\frac{1 - β}{β})

α \cdot d \leq d e g_{A^{'}} (v) \leq d \forall v \in A^{'}

α \cdot d \leq d e g_{A^{'}} (v) \leq d \forall v \in A^{'}

β (G) + \frac{δ}{2} \leq \frac{1}{2}

β (G) + \frac{δ}{2} \leq \frac{1}{2}

\frac{e ( A ^{'} \cap H , A ^{'} \cap B )}{v o l _{A^{'}} ( A ^{'} \cap H )} \leq γ_{e}

\frac{e ( A ^{'} \cap H , A ^{'} \cap B )}{v o l _{A^{'}} ( A ^{'} \cap H )} \leq γ_{e}

∣ A^{'} ∖ A ∣ \leq δ ∣ A ∣

∣ A^{'} ∖ A ∣ \leq δ ∣ A ∣

Φ_{e} (G ∣_{A^{'}}) > \frac{γ _{e}}{α} \cdot (\frac{1 - β}{β})

Φ_{e} (G ∣_{A^{'}}) > \frac{γ _{e}}{α} \cdot (\frac{1 - β}{β})

∣ A^{'} \cap B ∣ \leq ∣ A \cap B ∣ + ∣ A^{'} ∖ A ∣ = β (G) \cdot ∣ A ∣ + ∣ A^{'} ∣ - ∣ A ∣

∣ A^{'} \cap B ∣ \leq ∣ A \cap B ∣ + ∣ A^{'} ∖ A ∣ = β (G) \cdot ∣ A ∣ + ∣ A^{'} ∣ - ∣ A ∣

∣ A^{'} \cap B ∣ \leq \frac{( 1 - δ ) ∣ A ∣}{2} + ∣ A^{'} ∣ - ∣ A ∣ = \frac{∣ A ^{'} ∣}{2} - \frac{δ ∣ A ∣}{2} + \frac{∣ A ^{'} ∣}{2} - \frac{∣ A ∣}{2}

∣ A^{'} \cap B ∣ \leq \frac{( 1 - δ ) ∣ A ∣}{2} + ∣ A^{'} ∣ - ∣ A ∣ = \frac{∣ A ^{'} ∣}{2} - \frac{δ ∣ A ∣}{2} + \frac{∣ A ^{'} ∣}{2} - \frac{∣ A ∣}{2}

∣ A^{'} \cap B ∣ \leq \frac{∣ A ^{'} ∣}{2} - \frac{δ ∣ A ∣}{2} + \frac{δ ∣ A ∣}{2} = \frac{∣ A ^{'} ∣}{2}

∣ A^{'} \cap B ∣ \leq \frac{∣ A ^{'} ∣}{2} - \frac{δ ∣ A ∣}{2} + \frac{δ ∣ A ∣}{2} = \frac{∣ A ^{'} ∣}{2}

∣ A^{'} \cap B ∣ \leq ∣ A^{'} \cap H ∣

∣ A^{'} \cap B ∣ \leq ∣ A^{'} \cap H ∣

v o l_{A^{'}} (A^{'} \cap B)

v o l_{A^{'}} (A^{'} \cap B)

\geq a \in A^{'} \cap B \sum α d = α d ∣ A^{'} \cap B ∣ .

v o l_{A^{'}} (A^{'} \cap H) \geq α d ∣ A^{'} \cap H ∣

v o l_{A^{'}} (A^{'} \cap H) \geq α d ∣ A^{'} \cap H ∣

v o l_{A^{'}} (A^{'} \cap H) \geq α d ∣ A^{'} \cap B ∣

v o l_{A^{'}} (A^{'} \cap H) \geq α d ∣ A^{'} \cap B ∣

min {v o l (A^{'} \cap H), v o l (A^{'} \cap B)} \geq α d ∣ A^{'} \cap B ∣

min {v o l (A^{'} \cap H), v o l (A^{'} \cap B)} \geq α d ∣ A^{'} \cap B ∣

\frac{e ( A ^{'} \cap H , A ^{'} \cap B )}{α d ∣ A ^{'} \cap B ∣} \geq \frac{e ( A ^{'} \cap H , A ^{'} \cap B )}{min { v o l ( A ^{'} \cap H ) , v o l ( A ^{'} \cap B )}} > \frac{γ _{e}}{α} \cdot (\frac{1 - β}{β}),

\frac{e ( A ^{'} \cap H , A ^{'} \cap B )}{α d ∣ A ^{'} \cap B ∣} \geq \frac{e ( A ^{'} \cap H , A ^{'} \cap B )}{min { v o l ( A ^{'} \cap H ) , v o l ( A ^{'} \cap B )}} > \frac{γ _{e}}{α} \cdot (\frac{1 - β}{β}),

\frac{e ( A ^{'} \cap H , A ^{'} \cap B )}{d γ _{e} ∣ A ^{'} \cap B ∣} \geq \frac{1 - β}{β}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Building a Sybil-Resilient Digital Community Utilizing Trust-Graph Connectivity††thanks: A preliminary version of this paper was presented at the 14th International Computer Science Symposium in Russia, July 1-5, 2019, Novosibirsk, Russia [18].

This current version contains more discussions and clearer representation of theorems and proofs. It also contains two methods for connectivity measurement, while the preliminary version discussed only one.

Ouri Poupko O. Poupko, Weizmann Institute of Science, [email protected]

Gal Shahaf G. Shahaf, Weizmann Institute of Science, [email protected]

Ehud Shapiro E. Shapiro, Weizmann Institute of Science, [email protected]

Nimrod Talmon N. Talmon, Ben-Gurion University, [email protected]

Abstract

Preventing fake or duplicate digital identities (aka sybils) from joining a digital community may be crucial to its survival, especially if it utilizes a consensus protocol among its members or employs democratic governance, where sybils can undermine consensus, tilt decisions, or even take over. Here, we explore the use of a trust-graph of identities, with edges representing trust among identity owners, to allow a community to grow indefinitely without increasing its sybil penetration. Since identities are admitted to the digital community based on their trust by existing digital community members, corrupt identities, which may trust sybils, also pose a threat to the digital community. Sybils and their corrupt perpetrators are together referred to as byzantines, and the overarching aim is to limit their penetration into a digital community. We propose two alternative tools to achieve this goal. One is graph conductance, which works under the assumption that honest people are averse to corrupt ones and tend to distrust them. The second is vertex expansion, which relies on the assumption that there are not too many corrupt identities in the community. Of particular interest is keeping the fraction of byzantines below one third, as it would allow the use of Byzantine Agreement [15] for consensus as well as for sybil-resilient social choice [19]. This paper considers incrementally growing a trust graph and shows that, under its key assumptions and additional requirements, including keeping the conductance or vertex expansion of the community trust graph sufficiently high, a community may grow safely, indefinitely.

1 Introduction

The goal of this paper is to identify conditions under which a digital community of predominantly genuine (singular and unique) digital identities [20] may grow without increasing the penetration of sybil (fake or duplicate) digital identities. Our particular context of interest is digital democracy [21, 22], where a sovereign digital community conducts its affairs via egalitarian decision processes; another motivation is the task of growing a permissioned distributed system. Consider an initial digital community with low sybil penetration that wishes to admit new members without admitting too many sybils. As it is not realistic to expect that no sybils will be admitted, the goal is to keep the fraction of sybils below a certain threshold. In a separate paper [19], we show that a digital democracy can tolerate up to one-third sybil penetration and still function democratically. Still, the fewer the sybils, the smaller the supermajority needed to defend against them.

We model a digital community via a trust graph with a vertex for each identity and with edges representing trust relations between the owners of the corresponding identities (formal definitions are given in Section 2). The model considers genuine and sybil identities (cf. [20]), and refers to the genuine identities that do not trust sybils as honest and those that do as corrupt. Furthermore, to describe an admission process that facilitates incremental community growth, the model presents sequences of trust graphs that may result from such a process.

The goal is to identify sufficient conditions on such graphs, for example, the type of identities in the graph, their relative fractions, and their trust relations, under which a community may grow while keeping the fraction of sybils in it low. To achieve this, we use two similar approaches, which differ in the assumptions made on the power of the adversary: The first approach assumes that honest identities tend to trust honest identities rather than corrupt ones, therefore it is hard for the corrupt ones (the adversary) to create trust edges with honest identities. In this case graph conductance bounds the ratio of sybils in the graph. The second approach assumes that there are not too many corrupt identities, therefore the adversary power is limited by its own size. In this case vertex expansion bounds the ratio of sybils in the graph.

1.1 Related work

This section reviews existing work, particularly work that helps clarifying the differences in our proposed model. A large portion of the literature on sybil attacks (see, for example, [9, 17, 16] and their citations) is focused on sybil detection, where the task is to tell the sybil agents from the honest ones. Of particular interest is the approach initiated by Yu et al. [28], which relies on structural properties of the underlying social network. Yu et al. show how to separate the honest and sybil regions by leveraging the assumption that there are, relatively, few number of edges between them. This framework was studied further [26, 7, 23, 24, 25, 4]. As pointed out by Alvisi et al. [2], however, such attempts to recover the entire sybil region may potentially occur only in instances where the honest region is sufficiently connected, which is rarely the case in actual social networks. Consequently, Alvisi et al. suggest a more modest goal of producing a whitelist of honest vertices in the graph with respect to a given agent; that is, a local sybil detection scheme, in contrast to the global ones proposed before. Another important aspect of our model that is not apparent in existing works is the differentiation it makes of the identities into three sets (and not merely two): honest, corrupt, and sybil identities.

A problem of a similar flavor is that of corruption detection in networks, posed by Alon et al. [1] and later refined by Jin et al. [14]. This setting, inspired by auditing networks, consists of a graph with each of its vertices being either truthful or corrupt, where the overall goal is to detect the corrupt region. In contrast to the sybil detection problem, the corrupt agents are assumed to be immersed throughout the network, and the setting assumes a very restrictive assumption, namely that each agent may accurately determine the true label of its neighbors and report it to a central authority. The authors show how good connectivity properties of the graph allows an approximate recovery of the truthful and corrupt regions.

Note that social networks have some special structure, for example, having low diameters (a.k.a., the small world phenomena [10]) or fragmented to highly-connected clusters with low connectivity between different clusters. Moreover, as observed by some researchers [2, 7, 27], the attacker’s inability to maintain sufficiently many attack edges typically results in certain “bottlenecks”, which can be utilized to pin-point the sybil regions.

1.2 Informal Model

While the problem addressed is related to sybil detection, and indeed we incorporate some of the insights of the works discussed above, here the main goal is different: Safe community growth. This work aims to find conditions under which a community may grow without increasing the fraction of hostile members within it; but without necessarily identifying explicitly who is hostile and who is not. An additional difference from existing literature is the notions of identity and trust. Specifically, existing works consider identities or agents of only two types, “good” and “bad”, with various names for the two categories. In this work the notion of identities [20], is more refined and, we believe, may be closer to reality.

In particular, this work considers genuine and sybil identities, with the intention that in a real-world scenario these would be characterized by the nature of their representation: genuine identities are singular and unique, else are sybil (duplicate or fake, namely not corresponding to a single real person). It further distinguishes between two types of genuine identities, based on their behavior: honest, which do not form trust relations with sybils, and corrupt, which do. This behavioral distinction is captured formally in the proposed model. We naturally assume that the owners of corrupt identities are the creators and operators of the sybils and that, in the worst case, all sybils and their corrupt perpetrators may cooperate, hence the model labels them together as byzantines, and aims to limit their fraction within the community.

We thus begin with a unified formal model of such identities and their trust graph, consisting of vertices that represent identities and edges that represent trust relations among the owners of such identities. The exact definition of these trust relations are outside the scope of this paper, but in a related work [20] we consider a spectrum of such trust relations, expressed as mutual sureties among identity owners, and inspect their applicability also to the work presented here. Considering the task of sybil-resilient community growth, the model defines the community history that aims to capture the incremental changes a community trust graph undergoes in discrete steps. In order to properly characterize identities, the model first employs the basic distinction between genuine and sybil identities. Then, using the community history, it makes a further delicate distinction within genuine identities between honest identities, which never trust sybils, and corrupt identities, which may trust sybils and, furthermore, may cooperate with other corrupt or sybil community members to introduce sybils into the community.

Some assumptions on the power of the sybils and their perpetrators are needed; otherwise there is no hope in achieving our goal. We present two possible alternative assumptions: The first intuitive assumption is that honest identities are averse to corrupt identities, and hence are not likely to trust them. Trust edges that connect honest and corrupt identities are referred to as attack edges. So, loosely speaking, the assumption is that there are not too many attack edges. We view this assumption as more realistic than the assumption made in related works [1, 14], that truthful agents can identify precisely whether a neighbor is corrupt or not. Figure 1 illustrates the general setting. The second assumption is that there are not too many corrupt identities in the community. This assumption could be realized, for example, by an incentive mechanism that penalizes for trusting sybils and rewards honest identities.

1.3 High-level approach

After defining the three types of population in the community, it is clear that the corrupt identities are the adversary to the goal of growing a community without sybils. Without corrupt identities, if the first identity in the community is not a sybil (therefore it is honest), and given that, by definition, honest identities have no trust edges with sybils, then sybils cannot join the community. To gain intuition regarding the two assumption on the power of adversaries, consider an extreme case, as shown in Figure 2, where the power of the adversary is minimal. The graph on top represents the first assumption, that honest identities are avers to corrupt identities. The graph below represents the second assumption, that there are not too many corrupt identities. In this extreme example the graph is not constrained in any way, which shows that even a weak adversary can add as many sybils as it wants, without additional measures. Our approach will be to measure the connectivity of the graph and derive a bound on the number of byzantines based on this measurement. The example in Figure 2 shows that some simple measurements of connectivity are fruitless for the goal of sybil detection. One such measurement is how dense the graph is, or what is the lower bound on the number of edges within the community. Both graphs show a community where the lower bound on the number of edges is of order $n/2$ , and yet the corrupt identities are able to introduce as many sybils as they wish. Another simple measurement is the diameter of the graph, which is also very low in these two communities - 3 at the top and 2 at the bottom.

Yet there is a clear bottleneck in these extreme examples between honest and byzantines. The measures that capture precisely this type of bottleneck are conductance, when the bottleneck is in the edges, and vertex expansion when the bottleneck is in the vertices. The ability to protect the graph from byzantine penetration is based on the key assumption that, while there could be arbitrarily many byzantines wanting to join the growing community, they will have limited connectivity to the current community. Indeed, this observation was applied in the context of sybil detection [2, 28, 26, 27].

In general, while the connectivity of the whole network is typically fairly low, a social network usually contains many clusters that reflect real life communities. The connectivity of the subgraphs restricted to each of these clusters may be high. In that sense, following Alvisi et al. [2], we adopt a local perspective and focus on the connectivity of the community, regardless of the connectivity of the entire network. In contrast to Alvisi et al. [2], however, we are interested in growing the community and not in whitelisting. Unlike the situation treated by Alvisi et al. [2], which can be viewed as whitelisting, initiated at a singleton community (that is, from a single non-sybil vertex), here we consider arbitrarily-large communities and aim to bound, but not detect or eliminate, the sybils in them.

Specifically, our framework makes use of a “target conductance” parameter $\Phi_{e}$ , or a “target vertex expansion” parameter $\Phi_{v}$ , and aims to grow, that is, admit new members, while retaining a conductance of at least $\Phi_{e}$ , or $\Phi_{v}$ respectively, at the larger community. Assuming that the initial community harbors a limited attack power and a bounded fraction of byzantines, this paper shows how to safely grow the community, indefinitely. The number of members that may join in each increment is a parameter of the algorithm and is related to the bound on byzantines the community maintains. The lower the bound the more members the community can add in each increment. The bound on byzantines, in turn, depends on the target conductance or vertex expansion that the community maintains. The higher the connectivity of the community, the better the bound on byzantines.

Remark 1.

Note that our methods are deterministic. That is, they guarantee – deterministically – that, if the parameters have certain values and if the assumptions hold, then the conclusion – namely, that the growing community retains a low fraction of sybil penetration – holds.

1.4 Paper structure

The paper begins with graph theory terminology and formal definition of graph conductance and vertex expansion in Section 2. For simplicity, the framework describes undirected and unweighted graphs. Note, however, that it may easily be modified and applied to directed and weighted graphs as well. The model is formally described in Section 3, by defining types of identities, communities and community history. Then, Section 4 describes the first method, based on the assumption of little trust and the use of conductance, and showing sufficient conditions for safe community growth. Section 5, shows that the framework is compatible with sparse trust graphs and provides some quantitative estimations of its guarantees. Section 6 and Section 7 introduce and analyze the second method, based on the assumption that there are not too many corrupt identities. Section 8 concludes with intriguing open questions for future research.

2 Preliminaries

This section provides some needed definitions regarding graphs and graph connectivity. Refer to any graph theory textbook, like Diestel’s Graph Theory [8] for additional background.

Let $G=(V,E)$ be an undirected graph. The degree of a vertex $x\in V$ is:

[TABLE]

$G$ is $d$ -regular if $\deg(x)=d$ holds for each $x\in V$ . The volume of a given subset $A\subseteq V$ is the sum of degrees of its vertices:

[TABLE]

Additionally, denote the subgraph induced on the set of vertices $A$ as $G|_{A}$ , the degree of vertex $x\in A$ in $G|_{A}$ by $\deg_{A}(x)$ , and the volume of a set $B\subseteq A$ in $G|_{A}$ by:

[TABLE]

Given two subsets $A,B\subseteq V$ , the size of the cut between $A$ and $B$ is denoted by:

[TABLE]

Definition 1 (Conductance).

Let $G=(V,E)$ be a graph. The conductance of $G$ is defined by:

[TABLE]

where $A^{c}:=V\setminus A$ is the complement of $A$ .

Remark 2.

Generally speaking, graph conductance aims to measure the connectivity of the graph by quantifying the minimal cut normalized by the volume of its smaller subset. Conductance should be thought of as the weighted and irregular analogue of edge expansion [12], where both notions are essentially equivalent for regular graphs. To get a quantitative grip of this measure, notice that for all graphs, $\Phi_{e}\in[0,\frac{1}{2}]$ . Intuitively, the conductance of a highly connected graph approaches $\frac{1}{2}$ . For example, cliques and complete bipartite graphs satisfy $\Phi_{e}=\frac{1}{2}$ , while in a poorly connected graph this measure may be arbitrarily small; for example, a disconnected graph satisfies $\Phi_{e}=0$ .

The next sections provide theoretical guarantees on sybil safety, given that one can compute conductance. However, determining the exact conductance of a given graph is known to be coNP-hard [3]. Luckily, the Cheeger inequality [5] provides a direct relation between conductance of a graph and the second eigenvalue of its random walk matrix, which can be calculated in polynomial time, and approximated in nearly linear time. Refer to [12], [13] and [6] for comprehensive surveys regarding efficient algorithms for measuring conductance.

Definition 2 (Inner Boundary Vertex Expansion).

Let $G=(V,E)$ be a graph. Given two subsets $A,B\subseteq V$ , define the inner boundary of $A$ w.r.t. $B$ by

[TABLE]

The inner boundary vertex expansion is then defined by:

[TABLE]

Like conductance, vertex expansion also aims to measure the connectivity of the graph, this time by quantifying the minimal vertex cut, rather than the minimal edge cut.

To get a quantitative grip of this measure, note that for all graphs $\Phi_{v}\in[0,1]$ . Intuitively, the vertex expansion of a highly connected graph approaches $1$ . For example, a clique satisfies $\Phi_{v}=1$ , while in a poorly connected graph this measure may be arbitrarily small and a disconnected graph satisfies $\Phi_{v}=0$ . Also note the relation between conductance and vertex expansion, given by $\Phi_{v}/d\leq\Phi_{e}\leq\Phi_{v}$ for $d$ -regular graphs.

3 Formal Model

3.1 Community Trust Graphs

The relation between people and their identities is rich and multifaceted. For the purpose of this paper, assume that some identities are genuine and others are not, in which case they are called sybils. We represent trust relations among identities via a trust graph, in which vertices represent identities and edges represent trust among identities.

Definition 3.

A trust graph $G=(V,E)$ is an undirected graph with vertices that represent identities and edges that represent trust among them.

The concept of a community trust graph follows, which depicts the community that grows within such a trust graph.

Definition 4.

A community trust graph $G=(A,V,E)$ is a trust graph with vertices $V$ , edges $E$ , and a community $A\subseteq V$ .

3.2 Community Histories and Transitions

The aim of this paper is to find conditions under which a community may grow safely. A graph of identities represents the community. Once establishing some conditions on a given community, we want to verify that these conditions hold under the operation of adding additional identities to the community graph. As the newly added identities threaten these conditions (for example, assume that the community has a bound on the ratio of corrupt identities, and then the added identities may be corrupt and the new community will cross this bound), the model breaks the growth of the community into steps of incremental growth.

Definition 5 (Community History).

A community history $\mathcal{G}_{V}$ over a set of vertices111As the set of vertices $V$ is fixed in a community history, it does not explicitly model the birth and death of people; modeling this aspect is the subject of future work. $V$ , is a sequence of community trust graphs $\mathcal{G}_{V}=G_{1},G_{2},\ldots$ , where $G_{i}=(A_{i},V,E)$ , such that $\forall iA_{i}\subset A_{i+1}$ .

3.3 Types of Identities

There are two types of identities: genuine and sybil. Next, community histories distinguish between two types of genuine identities – honest and corrupt: An identity is corrupt in a community history if it ever shares an edge with a sybil in this history, and honest if it does not. Lumping together sybils and corrupt identities, they form the group of byzantines.

The rationale is to bound the number of sybils in the graph, not only at the present but also in the future. Hence, the model bounds also all potential sybil perpetrators, who may establish trust edges with sybils in the future, in an attempt to introduce them into the community. Hence, at any point in time (that is, community graph in a community history), a corrupt identity may be only “corrupt at heart”, with no action as-of-yet to demonstrate its corruption; and the key assumption is that honest identities are averse to corrupt identities even if they are only corrupt at heart.

Below and in the rest of the paper we use disjoint union $A=B\uplus C$ as a shorthand for $A=B\cup C$ , $B\cap C=\emptyset$ .

Definition 6 (Types of identities, Attack edges, Sybil penetration).

Let $V$ be a set of vertices that consist of two disjoint subsets $V=T\uplus S$ of genuine $T$ and sybil $S$ vertices, and let $\mathcal{G}_{V}$ be a community history over $V$ . Then, a genuine vertex $t\in T$ is corrupt in $\mathcal{G}_{V}$ if it trusts a sybil at anytime in $\mathcal{G}_{V}$ , namely, there is some $(t,s)\in E$ , with $t\in T$ , $s\in S$ , for some $G=(A,V,E)\in\mathcal{G}_{V}$ . A genuine vertex that is not corrupt is said to be honest. Thus, $\mathcal{G}_{V}$ partitions the genuine identities $T=H\uplus C$ into honest $H$ and corrupt $C$ identities. An edge $(h,c)\in E$ is an attack edge if $h\in H$ and $c\in C$ . The sybil penetration $\sigma(G)$ of a community trust graph $G=(A,V,E)\in\mathcal{G}_{V}$ is

[TABLE]

Remark 3.

An important observation is that an attack edge $(h,c)$ may be introduced into a community trust graph in a community history, and be defined as such, even if the corruption of $c$ is still latent in this community trust graph, namely before a trust edge $(c,s)$ between $c$ and a sybil $s$ is introduced.

In the worst case, sybils and their corrupt perpetrators would cooperate; thus, to allow for incremental community growth, it must bound their combined presence in the community, as defined next:

Definition 7 (Byzantines and their Penetration).

Let $\mathcal{G}_{V}$ be a community history over $V=T\uplus S$ that partitions $T=H\uplus C$ into honest $H$ and corrupt $C$ identities. Then, a vertex $v\in V$ is byzantine if it is a sybil or corrupt and the byzantines $B=S\uplus C$ are the union of the sybil and corrupt vertices. The byzantine penetration $\beta(G)$ of a community trust graph $G=(A,V,E)\in\mathcal{G}_{V}$ is

[TABLE]

As $A=(A\cap H)\uplus(A\cap B)$ , it would occasionally be convenient to use the equivalence between byzantine penetration to the community $A$ and the fraction of byzantines w.r.t. genuine identities in $A$ . Formally,

[TABLE]

4 Conductance-Based Approach

The goal of this section is to find the conditions under which a community can grow while bounding the penetration of byzantines and sybils. The reader may read the following remedy as high level instructions to achieve this goal:

Start with an initial community. 2. 2.

Choose the desired bound on byzantine penetration. 3. 3.

Measure the fraction of edges within the community, out of all edges stemming out of the community. 4. 4.

Estimate a bound on the connectivity between honest and sybil/byzantine identities. 5. 5.

Admit new candidates to the community only if the connectivity within the target community is sufficiently large.

The following provides sufficient conditions for byzantine-resilient community growth, under the assumption that honest people tend to trust honest people and distrust corrupt people.

Theorem 1.

Let $\mathcal{G}_{V}$ be a community history. Set parameters $\alpha\in[0,1],\beta\leq\frac{1}{2}-\frac{1}{|A_{1}|},\gamma_{e}\in[0,\frac{1}{2}],\delta=1-2\beta$ . Assume:

All communities have a bounded degree, both above and below:

[TABLE] 2. 2.

Byzantine penetration to the initial community is bounded:

[TABLE] 3. 3.

The edges between honest and byzantine identities are relatively scarce:

[TABLE] 4. 4.

Community growth is bounded:

[TABLE] 5. 5.

The conductance within $A_{i}$ is sufficiently high:

[TABLE]

Then, every community $G_{i}\in\mathcal{G}_{V}$ has Byzantine penetration $\beta(G_{i})\leq\beta$ .

Roughly speaking, Theorem 1 suggests that whenever: (1) Each graph $G_{i}|_{A_{i}}$ has a bounded degree, both above and below; (2) Byzantine penetration to $A_{1}$ is bounded; (3) Edges between honest and byzantine identities are scarce; (4) Community growth in each step is bounded; (5) The conductance within $G_{i}|_{A_{i}}$ is sufficiently high; Then, the community may grow indefinitely with bounded byzantine penetration.

Theorem 1 follows by induction from the following Lemma:

Lemma 1.

Let $G=(A,V,E)$ and $G^{\prime}=(A^{\prime},V,E)$ be two community trust graphs, where $A\subset A^{\prime}$ . Set parameters $\alpha\in[0,1]$ and $\beta,\gamma,\delta\in[0,\frac{1}{2}]$ . Assume:

Each vertex in $A^{\prime}$ has a bounded degree, both above and below:

[TABLE] 2. 2.

Byzantine penetration to the initial community is bounded:

[TABLE] 3. 3.

The edges between honest and byzantine identities are relatively scarce:

[TABLE] 4. 4.

Community growth is bounded:

[TABLE] 5. 5.

The conductance within $A^{\prime}$ is sufficiently high:

[TABLE]

Then, $\beta(G^{\prime})\leq\beta$ .*

Proof.

First note that even if all the added identities from $A$ to $A^{\prime}$ are byzantines, it still follows that

[TABLE]

Applying assumption (2):

[TABLE]

Applying assumption (4):

[TABLE]

As $V=B\uplus H$ , it follows that:

[TABLE]

Now utilizing assumption (1):

[TABLE]

Similarly, the following holds:

[TABLE]

Inequalities 2 and 4 imply that:

[TABLE]

and together with Inequality 4:

[TABLE]

Now, Inequality 5 and assumption (5) imply that:

[TABLE]

or equivalently:

[TABLE]

Assumptions (1) and (3) imply:

[TABLE]

or equivalently:

[TABLE]

Combining Inequalities 6 and 7:

[TABLE]

where the first equality holds as $A=(A\cap{H})\uplus(A\cap B)$ , the second inequality stems from Equation 7 and the third inequality stems from Equation 6. Flipping the nominator and the denominator then gives $\beta(A^{\prime}):=\frac{|A^{\prime}\cap B|}{|A^{\prime}|}<\beta$ . ∎

Remark 4.

A potential application of lemma 1 is a byzantine-resilient union of two communities. Let $A,A^{\prime}\subseteq V$ denote two communities that have some overlap (non-empty intersection) and wish to unite into $A_{2}:=A\cup A^{\prime}$ . Then, if lemma 1 holds for $(A_{1},A_{2})$ in case $A_{1}:=A$ and also in case $A_{1}:=A^{\prime}$ , this would provide both $A$ and $A^{\prime}$ the necessary guarantee that the union would not result in an increase of the sybil penetration rate for either community.

5 Analysis of the Conductance-Based Approach

Our results show the conditions under which a community can grow and maintain sybil safety. It is still not clear however if such conditions are practical. This section takes a closer look at graphs, graph conductance and the interplay between the parameters. We show that under the range of possible parameters in the model and the required conductance derived from these parameters there are indeed many such graphs that meet the requirements. Theoretically, a fully connected graph easily holds these requirements, but trust graphs are rather sparse graphs, so specifically the question is whether sparse graphs can hold these requirements.

5.1 Sparse Graphs

Recall that the safety of the community growth, more specifically the required level of conductance for the community to grow safely, relies upon the parameters $\alpha$ , $\beta$ , and $\gamma_{e}$ . While a given community may evolve wrt. any choice of parameters, some choices will inevitably yield degenerate outcomes; one case is as the model requires $\Phi_{e}(G|_{A^{\prime}})>\frac{\gamma_{e}}{\alpha}\cdot\left(\frac{1-\beta}{\beta}\right)$ , while the conductance of any graph is upper bounded by $\frac{1}{2}$ . Specifically, whenever $\gamma_{e}\left(\frac{1-\beta}{\beta}\right)>\frac{1}{2}$ , the community cannot possibly grow, regardless of the choice of $\alpha$ . While complete graphs and complete bipartite graphs are the classic examples of graphs which satisfy $\Phi_{e}(G|_{A^{\prime}})=\frac{1}{2}$ , the fact that their degree is of order $d=\Theta(n)$ makes them unrealistic in our setting, where agents may potentially trust only a uniformly-bounded number of identities. In this context, the main question seems to be the following: Could a given community safely grow while retaining a given maximal degree $d$ ? Surprisingly, not only that the answer is affirmative, it also holds for a plethora of trust graphs. We utilize Friedman’s classical result:

Theorem 2.

(Friedman [11], rephrased) Let $G$ be a random $d$ -regular graph on $n$ vertices. Then, for any $0<\epsilon$ , $\lambda(G)\leq\frac{2\sqrt{d-1}}{d}+\epsilon$ holds with probability $1-o_{n}(1)$ .

Thus, almost all $d$ -regular graphs on $n$ vertices satisfy $\lambda_{2}\leq\frac{2}{\sqrt{d}}$ . Applying this term in Cheeger’s inequality yields that such graphs satisfy

[TABLE]

meaning that the choice of $d$ affects the level of conductance one hopes to achieve.

5.2 Parameter Interplay

The following subsection considers numerical examples to better appreciate the analysis above. First, consider the realistic assumption where each identity is assumed to trust up to $d=100$ identities (notice that this can be enforced by the system). Equation 8 now suggests that a random graph of degree $d$ on $n$ vertices (where $d$ may be constant wrt. $n$ ) satisfies $\Phi_{e}>\frac{2}{5}$ . For simplicity, we take this quantity as a benchmark. It follows that whenever $\frac{\gamma_{e}}{\alpha}\cdot\left(\frac{1-\beta}{\beta}\right)<\frac{2}{5}$ , there exist a plethora of potential community histories for which a given community may potentially grow to be arbitrarily large. Some further examples:

If $\gamma_{e}=0$ , then any community history that begins with a connected byzantine-free community would retain [math]-byzantine penetration; 2. 2.

The choice $\beta=0$ is not attainable, corresponding to the intuition that one can never guarantee a completely byzantine-free community growth.

Figure 3 illustrates the parameter interplay further. Notice that the key assumption, stating that honest people tend to trust honest people more than they tend to trust corrupt people, implies that $\gamma_{e}<\beta$ (as $\gamma_{e}>\beta$ implies that honest people trust corrupt people more than their relative share in the community).222In a separate line of research (in preparation) we consider processes and mechanisms that help lowering $\gamma_{e}$ even further.

5.3 Parameter Estimation

While $\alpha$ and $\Phi_{e}$ can be decided by the community (either by the foremothers of the community or by a global, decentralized democratic decision making process), $\beta(G)$ and $\gamma_{e}$ rely on the dynamics of the community history. To incrementally grow the community at a given time, one may settle for estimating the current state of affairs, as follows. Specifically, assuming that a thorough examination of a given identity could determine whether it is genuine or sybil, one may apply random checks to empirically estimate $\beta(G)$ and $\gamma_{e}$ . This could be carried out in the following manner:333A related sampling-based approach to estimate the number of sybils is briefly discussed by Shahaf et al. [19, Remark 2].

Examination of an identity $x\in V$ determines whether it is genuine or sybil 2. 2.

Examination of the neighbors of a genuine identity $x\in V$ (the ball of radius $1$ around it) determines whether it is explicitly (but not latently) corrupt 3. 3.

Examination of the ball of radius $2$ around an honest identity $x$ determines whether its neighbors are explicitly byzantine

6 Vertex Expansion Approach

The next section presents our second assumption, which focuses on the corrupt identities themselves, rather then the trust between honest identities and corrupt identities. Thus, we simply assume that there is a bound on how many identities in a community are corrupt. In a trust graph this results in a limited number of vertices on the boundary between honest identities and sybil identities. The following provides sufficient conditions for byzantine-resilient community growth, under the assumption that the population of corrupt identities in the community is bounded. This time we use vertex expansion to derive a bound on the number of byzantine identities.

Theorem 3.

Let $\mathcal{G}_{V}=G_{1},G_{2},\ldots$ be a community history over $V$ . Let $\beta\leq\frac{1}{2}-\frac{1}{2|A_{1}|}$ , $\gamma_{v}\in[0,\frac{1}{2}]$ , and $\delta=1-2\beta$ . Assume:

Byzantine penetration to the initial community is bounded:

[TABLE] 2. 2.

The population of corrupt identities is bounded:

[TABLE] 3. 3.

Community growth is bounded:

[TABLE] 4. 4.

The vertex expansion within $A_{i}$ is sufficiently high:

[TABLE]

Then, every community $G_{i}\in\mathcal{G}_{V}$ has Byzantine penetration $\beta(G_{i})\leq\beta$ .

Notice that there is one less parameter $\alpha$ in the vertex based version of the model. While it was required in the edge based version, to establish a lower bound on the volume of $H$ , and although it has a strong intuition for our goal (the more honest identities trust each other, the harder it is for the untrusted to penetrate their community), the theorem for the vertex based version will hold without it. This makes this version slightly simpler, as there is one less parameter that the community needs to decide upon.

As before, theorem 3 follows by induction from the following Lemma:

Lemma 2.

Let $G=(A,V,E)$ and $G^{\prime}=(A^{\prime},V,E)$ be two community trust graphs, where $A\subset A^{\prime}$ . Set parameters $\beta,\gamma,\delta\in[0,\frac{1}{2}]$ . Assume:

Byzantine penetration to the initial community is bounded:

[TABLE] 2. 2.

The population of corrupt identities is bounded in $A^{\prime}$ :

[TABLE] 3. 3.

Community growth is bounded:

[TABLE] 4. 4.

The vertex expansion within $A^{\prime}$ is sufficiently high:

[TABLE]

Then, $\beta(G^{\prime})\leq\beta$ .*

Proof.

Similarly to the proof of lemma 1, assumptions (1) and (3) imply that:

[TABLE]

Inequality 9 and assumption (4) imply that:

[TABLE]

where the last inequality stems from definition 6 (there are no edges between $H$ and $S$ , therefore the boundary between $B$ and $H$ is a subset of $C$ ). Applying assumption (2) it follows that:

[TABLE]

which leads to

[TABLE]

That is, $G^{\prime}$ has byzantine penetration $\beta(G^{\prime})\leq\beta$ . ∎

Remark 5.

Our two results for community growth, one based on conductance and the other based on vertex expansion, are very similar. The main difference between them lies in the premises of the two corollaries. The first assumes that honest people tend to trust honest people more than they tend to trust corrupt people. The second, which may be more naïve, directly assumes that there are not too many corrupted people in a given community to begin with. Again, the conditions under which we assume either of these bounds to be low is the subject of a separate line of work.

7 Analysis of the Vertex Expansion Approach

Given a $d$ -regular graph it can be shown that the inner boundary vertex expansion of the graph is at least as high as the graph conductance. Assume w.l.o.g that $|A|\leq|A^{c}|$ , since $\partial_{v}(A,A^{c})\cdot d\geq e(A,A^{c})$ it follows that:

[TABLE]

Going back to the numeric example in subsection 5.2, now setting $\Phi_{v}=\frac{2}{5}$ then it follows that whenever $\frac{\gamma_{v}}{\beta}<\frac{2}{5}$ , there exist a plethora of potential community histories for which a given community may potentially grow to be arbitrarily large. As an example, if the community wishes to achieve $\beta=0.2$ then it can tolerate $\gamma_{v}=0.08$ . Figure 4 illustrates the parameter interplay further. The line $\Phi_{v}=1$ shows a theoretical example where for each subset $A\subset V$ , for every $x\in A$ there exist $y\in A^{c}$ such that $(x,y)\in E$ . Assuming there is at least one honest identity in the community, and remembering that there cannot be an edge between an honest identity and a sybil identity, it follows that there are no sybils in any such community in $V$ . The line $\Phi_{v}=1$ expresses this result as it shows that $\gamma_{v}=\beta$ , which leads to $S=\emptyset$ .

Maintaining $\Phi_{v}=0.5$ leads to $\beta=2\gamma_{v}$ which means that the number of sybils in any such community is at most the number of corrupted identities that are willing to share an edge with a sybil identity. Unfortunately, the down side of using vertex expansion over conductance is that, as far as we know, there is no known way to measure or approximate vertex expansion better than the relation between vertex expansion and conductance shown above. We are also unaware of any method to construct a graph with vertex expansion $0.5$ or higher with a constant degree $d$ .

8 Outlook

We proposed two methods which allow a digital community to grow in a sybil-safe way. We analyzed them mathematically and showed that they are not only safe, but also feasible. Future research also includes mechanisms for penalizing the creation of attack edges while rewarding sybil hunting, modeling the possibility of honest identities abandoning the community, and using simulations to better understand the dynamics of safe growth.

Acknowledgements

We thank the Braginsky Center for the Interface between Science and the Humanities for their generous support.

Bibliography28

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] N. Alon, E. Mossel, and R. Pemantle. Corruption detection on networks. ar Xiv preprint ar Xiv:1505.05637 , 2015.
2[2] L. Alvisi, A. Clement, A. Epasto, S. Lattanzi, and A. Panconesi. Sok: The evolution of sybil defense via social networks. In Proceedings of SP ’13 , pages 382–396, 2013.
3[3] M. Blum, R. M. Karp, O. Vornberger, C. H. Papadimitriu, and M. Yannakakis. The complexity of testing whether a graph is a superconcentrator. Information Processing Letters , 13(4-5):164–167, 1981.
4[4] Q. Cao, M. Sirivianos, X. Yang, and T. Pregueiro. Aiding the detection of fake accounts in large scale social online services. In Proceedings of NSDI ’12 , pages 15–15, 2012.
5[5] Jeff Cheeger. A lower bound for the smallest eigenvalue of the laplacian. In Proceedings of the Princeton conference in honor of Professor S. Bochner , pages 195–199, 1969.
6[6] Fan R.K. Chung. Spectral graph theory . Number 92 in CBMS Regional Conference Series in Mathematics. American Mathematical Soc., 1997.
7[7] G. Danezis and P. Mittal. Sybil Infer: Detecting sybil nodes using social networks. In Proceedings of NDSS ’09 , pages 1–15, 2009.
8[8] Reinhard Diestel. Graph Theory (Graduate Texts in Mathematics) . Springer, 2017.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Building a Sybil-Resilient Digital Community Utilizing Trust-Graph Connectivity††thanks: A preliminary version of this paper was presented at the 14th International Computer Science Symposium in Russia, July 1-5, 2019, Novosibirsk, Russia [18].

Abstract

1 Introduction

1.1 Related work

1.2 Informal Model

1.3 High-level approach

Remark 1**.**

1.4 Paper structure

2 Preliminaries

Definition 1** (Conductance).**

Remark 2**.**

Definition 2** (Inner Boundary Vertex Expansion).**

3 Formal Model

3.1 Community Trust Graphs

Definition 3**.**

Definition 4**.**

3.2 Community Histories and Transitions

Definition 5** (Community History).**

3.3 Types of Identities

Definition 6** (Types of identities, Attack edges, Sybil penetration).**

Remark 3**.**

Definition 7** (Byzantines and their Penetration).**

4 Conductance-Based Approach

Theorem 1**.**

Lemma 1**.**

Proof.

Remark 4**.**

5 Analysis of the Conductance-Based Approach

5.1 Sparse Graphs

Theorem 2**.**

5.2 Parameter Interplay

5.3 Parameter Estimation

6 Vertex Expansion Approach

Theorem 3**.**

Lemma 2**.**

Proof.

Remark 5**.**

7 Analysis of the Vertex Expansion Approach

8 Outlook

Acknowledgements

Remark 1.

Definition 1 (Conductance).

Remark 2.

Definition 2 (Inner Boundary Vertex Expansion).

Definition 3.

Definition 4.

Definition 5 (Community History).

Definition 6 (Types of identities, Attack edges, Sybil penetration).

Remark 3.

Definition 7 (Byzantines and their Penetration).

Theorem 1.

Lemma 1.

Remark 4.

Theorem 2.

Theorem 3.

Lemma 2.

Remark 5.