Corrected overlap weight and clustering coefficient

Vladimir Batagelj

arXiv:1906.04581·cs.SI·February 6, 2020

Corrected overlap weight and clustering coefficient

Vladimir Batagelj

PDF

TL;DR

This paper identifies limitations in the traditional overlap weight and clustering coefficient measures for network analysis and proposes corrected definitions that better identify important network elements, demonstrated on the US Airports network.

Contribution

The authors introduce corrected versions of the overlap weight and clustering coefficient measures to improve their usefulness in data analysis tasks.

Findings

01

Corrected measures provide more meaningful identification of important nodes and links.

02

Application on US Airports network demonstrates the effectiveness of the corrected measures.

03

Traditional measures tend to highlight small maximal subgraphs, which can be misleading.

Abstract

We discuss two well known network measures: the overlap weight of an edge and the clustering coefficient of a node. For both of them it turns out that they are not very useful for data analytic task to identify important elements (nodes or links) of a given network. The reason for this is that they attain their largest values on maximal subgraphs of relatively small size that are more probable to appear in a network than that of larger size. We show how the definitions of these measures can be corrected in such a way that they give the expected results. We illustrate the proposed corrected measures by applying them on the US Airports network using the program Pajek.

Tables3

Table 1. Table 1: Largest triangular weights in US Airports 1997 network

$u$	$v$	$t (e)$	$d (u)$	$d (v)$	$o^{'} (e)$
Chicago O’hare Intl	Pittsburgh Intll	80	139	94	0.57971
Chicago O’hare Intl	Lambert-St Louis Intl	80	139	94	0.57971
Chicago O’hare Intl	Dallas/Fort Worth Intl	78	118	139	0.55714
Chicago O’hare Intl	The W B Hartsfield Atlanta	77	101	139	0.54610
The W B Hartsfield Atlanta	Charlotte/Douglas Intl	76	101	87	0.73077
The W B Hartsfield Atlanta	Dallas/Fort Worth Intl	73	101	118	0.58871

Table 2. Table 2: US Airports 1997 with clustering coefficient = 1 absent 1 =1

$n$	$\deg$	airport	$n$	$\deg$	airport
1	7	Lehigh Valley Intll	8	4	Gunnison County
2	5	Evansville Regional	9	4	Aspen-Pitkin Co/Sardy Field
3	5	Stewart Int’l	10	4	Hector Intll
4	5	Rio Grande Valley Intl	11	4	Burlington Regional
5	5	Tallahassee Regional	12	4	Rafael Hernandez
6	4	Myrtle Beach Intl	13	4	Wilkes-Barre/Scranton Intl
7	4	Bishop Intll	14	4	Toledo Express

Table 3. Table 3: US Airports 1997 with the largest corrected clustering coefficient

Rank	Value	deg	Id
1	0.3739	45	Cleveland-Hopkins Intl
2	0.3700	50	General Edward Lawrence Logan
3	0.3688	56	Orlando Intl
4	0.3595	42	Tampa Intl
5	0.3488	61	Cincinnati/Northern Kentucky Intl
6	0.3457	70	Detroit Metropolitan Wayne County
7	0.3455	67	Newark Intl
8	0.3429	53	Baltimore-Washington Intl
9	0.3415	47	Miami Intl
10	0.3405	42	Washington National
11	0.3379	56	Nashville Intll
12	0.3359	46	John F Kennedy Intl
13	0.3347	62	Philadelphia Intl
14	0.3335	41	Indianapolis Intl
15	0.3335	50	La Guardia

Equations40

min (de g (u), de g (v)) - 1 - t (e)

min (de g (u), de g (v)) - 1 - t (e)

\frac{t ( e )}{n - 2} \mbox or \frac{t ( e )}{μ}

\frac{t ( e )}{n - 2} \mbox or \frac{t ( e )}{μ}

o (e) = \frac{t ( e )}{( de g ( u ) - 1 ) + ( de g ( v ) - 1 ) - t ( e )}

o (e) = \frac{t ( e )}{( de g ( u ) - 1 ) + ( de g ( v ) - 1 ) - t ( e )}

J (X, Y) = \frac{∣ X \cap Y ∣}{∣ X \cup Y ∣}

J (X, Y) = \frac{∣ X \cap Y ∣}{∣ X \cup Y ∣}

∣ X \cup Y ∣ = ∣ X ∣ + ∣ Y ∣ - ∣ X \cap Y ∣ = (de g (u) - 1) + (de g (v) - 1) - t (e) .

∣ X \cup Y ∣ = ∣ X ∣ + ∣ Y ∣ - ∣ X \cap Y ∣ = (de g (u) - 1) + (de g (v) - 1) - t (e) .

O (e) = O (X, Y) = \frac{∣ X \cap Y ∣}{max ( ∣ X ∣ , ∣ Y ∣ )} = \frac{t ( e )}{max ( de g ( u ) , d e g ( v )) - 1} .

O (e) = O (X, Y) = \frac{∣ X \cap Y ∣}{max ( ∣ X ∣ , ∣ Y ∣ )} = \frac{t ( e )}{max ( de g ( u ) , d e g ( v )) - 1} .

m (e) = min (de g (u), de g (v)) - 1 \mbox an d M (e) = max (de g (u), de g (v)) - 1

m (e) = min (de g (u), de g (v)) - 1 \mbox an d M (e) = max (de g (u), de g (v)) - 1

o (e) = \frac{t ( e )}{m ( e ) + M ( e ) - t ( e )}, M (e) > 0

o (e) = \frac{t ( e )}{m ( e ) + M ( e ) - t ( e )}, M (e) > 0

m (e) + M (e) - t (e) \geq t (e) + t (e) - t (e) = t (e)

m (e) + M (e) - t (e) \geq t (e) + t (e) - t (e) = t (e)

o_{t} (a) = \frac{t _{t} ( a )}{( outdeg ( u ) - 1 ) + ( indeg ( v ) - 1 ) - t _{t} ( a )}

o_{t} (a) = \frac{t _{t} ( a )}{( outdeg ( u ) - 1 ) + ( indeg ( v ) - 1 ) - t _{t} ( a )}

o_{c} (a) = \frac{t _{c} ( a )}{indeg ( u ) + outdeg ( v ) - t _{c} ( a )}

o_{c} (a) = \frac{t _{c} ( a )}{indeg ( u ) + outdeg ( v ) - t _{c} ( a )}

o^{'} (e) = \frac{t ( e )}{μ + M ( e ) - t ( e )}

o^{'} (e) = \frac{t ( e )}{μ + M ( e ) - t ( e )}

o^{'} (e, G \cup f) = \frac{t ^{'} ( e )}{μ + M ^{'} ( e ) - t ^{'} ( e )} = \frac{t ( e ) + 1}{μ + M ( e ) - t ( e )} > o^{'} (e, G)

o^{'} (e, G \cup f) = \frac{t ^{'} ( e )}{μ + M ^{'} ( e ) - t ^{'} ( e )} = \frac{t ( e ) + 1}{μ + M ( e ) - t ( e )} > o^{'} (e, G)

o^{'} (e, G \cup f) = \frac{t ^{'} ( e )}{μ + M ^{'} ( e ) - t ^{'} ( e )} = \frac{t ( e ) + 1}{μ + M ( e ) - t ( e ) - 1} > \frac{t ( e ) + 1}{μ + M ( e ) - t ( e )} > o^{'} (e, G)

o^{'} (e, G \cup f) = \frac{t ^{'} ( e )}{μ + M ^{'} ( e ) - t ^{'} ( e )} = \frac{t ( e ) + 1}{μ + M ( e ) - t ( e ) - 1} > \frac{t ( e ) + 1}{μ + M ( e ) - t ( e )} > o^{'} (e, G)

cc (u) = \frac{∣ E ( N ( u )) ∣}{∣ E ( K _{d e g (u)} ) ∣} = \frac{2 \cdot E ( u )}{de g ( u ) \cdot ( de g ( u ) - 1 )}, de g (u) > 1

cc (u) = \frac{∣ E ( N ( u )) ∣}{∣ E ( K _{d e g (u)} ) ∣} = \frac{2 \cdot E ( u )}{de g ( u ) \cdot ( de g ( u ) - 1 )}, de g (u) > 1

E (u) = \frac{1}{2} e \in S (u) \sum t (e)

E (u) = \frac{1}{2} e \in S (u) \sum t (e)

c c^{'} (u) = \frac{2 \cdot E ( u )}{μ \cdot de g ( u )}, de g (u) > 0

c c^{'} (u) = \frac{2 \cdot E ( u )}{μ \cdot de g ( u )}, de g (u) > 0

c c^{'} (u, G \cup f) = \frac{2 \cdot E ^{'} ( u )}{μ \cdot de g ^{'} ( u )} = \frac{2 \cdot ( E ( u ) + 1 )}{μ \cdot de g ( u )} > c c^{'} (u, G)

c c^{'} (u, G \cup f) = \frac{2 \cdot E ^{'} ( u )}{μ \cdot de g ^{'} ( u )} = \frac{2 \cdot ( E ( u ) + 1 )}{μ \cdot de g ( u )} > c c^{'} (u, G)

2 \cdot E (u) = v \in N (u) \sum de g_{N (u)} (v) \leq v \in N (u) \sum μ = μ \cdot de g (u)

2 \cdot E (u) = v \in N (u) \sum de g_{N (u)} (v) \leq v \in N (u) \sum μ = μ \cdot de g (u)

2 \cdot E (u) \leq de g (u) \cdot (de g (u) - 1) \leq μ \cdot de g (u)

2 \cdot E (u) \leq de g (u) \cdot (de g (u) - 1) \leq μ \cdot de g (u)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Corrected overlap weight and clustering coefficient

Vladimir Batagelj

Institute of Mathematics, Physics and Mechanics,

Department of Theoretical Computer Science,

Jadranska 19, 1 000 Ljubljana, Slovenia

and

University of Primorska, Andrej Marušič Institute,

Muzejski trg 2, Koper, Slovenia

and

National Research University Higher School of Economics,

Myasnitskaya, 20, 101000 Moscow, Russia

e-mail: [email protected]

ORCID: 0000-0002-0240-9446

Abstract

We discuss two well known network measures: the overlap weight of an edge and the clustering coefficient of a node. For both of them it turns out that they are not very useful for data analytic task to identify important elements (nodes or links) of a given network. The reason for this is that they attain their largest values on maximal subgraphs of relatively small size that are more probable to appear in a network than that of larger size. We show how the definitions of these measures can be corrected in such a way that they give the expected results. We illustrate the proposed corrected measures by applying them on the US Airports network using the program Pajek.

Keywords: social network analysis, importance measure, triangular weight, overlap weight, clustering coefficient.

Mathematics Subject Classification 2010: 91D30, 91C05, 05C85, 68R10, 05C42.

1 Introduction

1.1 Network element importance measures

To identify important / interesting elements (nodes, links) in a network we often try to express our intuition about their importantance using an appropriate measure (node index, link weight) following the scheme

larger is the measure value of an element, more important / interesting is this element.

Too often, in analysis of networks, researchers uncritically pick some measure from the literature (degrees, closeness, betweenness, hubs and authorities, clustering coefficient, etc. (Wasserman and Faust, 1995; Todeschini and Consonni, 2009)) and apply it to their network.

In this paper we discuss two well known network local density measures: the overlap weight of an edge (Onnela et al., 2007) and the clustering coefficient of a node (Holland and Leinhardt, 1971; Watts and Strogatz, 1998).

For both of them it turns out that they are not very useful for data analytic task to identify important elements of a given network. The reason for this is that they attain their largest values on maximal subgraphs of relatively small size – they are more probable to appear in a network than that of larger size. We show how their definitions can be corrected in such a way that they give the expected results. We illustrate the proposed corrected measures by applying them on the US Airports network using the program Pajek. We will limit our attention to undirected simple graphs $\mathbf{G}=({\cal V},{\cal E})$ .

Many similar indices and weights were proposed by graph drawing community for disentanglement in visualization of hairball networks (Melançon an Sallaberry, 2008; Nocaj et al., 2015, 2016).

When searching for important subnetworks in a given network we often assume a model that in the evolution of the network the increased activities in a part of the network create new nodes and edges in that part increasing its local density. We expect from a local density measure $ld(x,\mathbf{G})$ for an element (node/link) $x$ of network $\mathbf{G}$ the following properties:

ld1.

adding an edge, $e$ , to the local neighborhood, $\mathbf{G}^{(1)}$ , does not decrease the local density

$ld(x,\mathbf{G})\leq ld(x,\mathbf{G}\cup e)$ .

ld2.

normalization: $0\leq ld(x,\mathbf{G})\leq 1$ .

ld3.

$ld(x,\mathbf{G})$ can attain value 1, $ld(x,\mathbf{G})=1$ , on the largest subnetwork of certain type in the network.

2 Overlap weight

2.1 Overlap weight

A direct measure of the overlap of an edge $e=(u:v)\in{\cal E}$ in an undirected simple graph $\mathbf{G}=({\cal V},{\cal E})$ is the number of common neighbors of its end nodes $u$ and $v$ (see Figure 1). It is equal to $t(e)$ – the number of triangles (cycles of length 3) to which the edge $e$ belongs. The edge neighbors subgraph is labeled $T(\deg(u)-t(e)-1,t(e),\deg(v)-t(e)-1)$ – the subgraph in Figure 1 is labeled $T(4,5,3)$ . There are two problems with this measure:

•

it is not normalized (bounded to $[0,1]$ );

•

it does not consider the ‘potentiality’ of nodes $u$ and $v$ to form triangles – there are

[TABLE]

nodes in the smaller set of neighbors that are not in the other set of neighbors.

Two simple normalizations are:

[TABLE]

where $n=|{\cal V}|$ is the number of nodes, and $\mu=\max_{e\in{\cal E}}t(e)$ is the maximum number of triangles on an edge in the graph $\mathbf{G}$ .

The (topological) overlap weight of an edge $e=(u:v)\in{\cal E}$ considers also the degrees of edge’s end nodes and is defined as

[TABLE]

In the case $\deg(u)=\deg(v)=1$ we set $o(e)=0$ . It somehow resolves both problems.

The overlap weight is essentially a Jaccard similarity index (Wikipedia, 2018)

[TABLE]

for $X=N(u)\setminus\{v\}$ and $Y=N(v)\setminus\{u\}$ where $N(z)$ is the set of neighbors of a node $z$ . In this case we have $|X\cap Y|=t(e)$ and

[TABLE]

Note also that $h(X,Y)=1-J(X,Y)=\frac{|X\oplus Y|}{|X\cup Y|}$ is the normalized Hamming distance (Wikipedia, 2018). The operation $\oplus$ denotes the symmetric difference $X\oplus Y=(X\cup Y)\setminus(X\cap Y)$ .

Another normalized overlap measure is the overlap index (Wikipedia, 2018)

[TABLE]

Both measures $J$ and $O$ , applied to networks, have some nice properties. For example: a pair of nodes $u$ and $v$ are structurally equivalent iff $J(X,Y)=O(X,Y)=1$ . Therefore the overlap weight measures the substitutiability of one edge’s end node by the other.

Introducing two auxiliary quantities

[TABLE]

we can rewrite the definiton of the overlap weight

[TABLE]

and if $M(e)=0$ then $o(e)=0$ .

For every edge $e\in{\cal E}$ it holds $0\leq t(e)\leq m(e)\leq M(e)$ . Therefore

[TABLE]

showing that $0\leq o(e)\leq 1$ .

The value $o(e)=1$ is attained exactly in the case when $M(e)=t(e)$ ; and the value $o(e)=0$ exactly when $t(e)=0$ .

In simple directed graphs without loops different types of triangles exist over an arc $a(u,v)$ . We can define overlap weights for each type. For example: the transitive overlap weight

[TABLE]

and the cyclic overlap weight

[TABLE]

where $t_{t}(a)$ and $t_{c}(a)$ are the number of transitive / cyclic triangles containing the arc $a$ . In this paper we will limit our discussion to overlap weights in undirected graphs.

2.2 US Airports links with the largest overlap weight

Let us apply the overlap weight to the network of US Airports 1997 (Batagelj and Mrvar, 2006). It consists of 332 airports and 2126 edges among them. There is an edge linking a pair of airports iff in the year 1997 there was a flight company providing flights between those two airports.

The size of a circle representing an airport in Figure 2 is proportional to its degree – the number of airports linked to it. The airports with the largest degree are:

[TABLE]

For the overlap weight the edge cut at level 0.8 (a subnetwork of all edges with overlap weight at least 0.8) is presented in Figure 3. It consists of two triangles, a path of length 2, and 17 separate edges.

A tetrahedron (Kwigillingok, Kongiganak,Tuntutuliak, Bethel), see Figure 4, gives the first triangle in Figure 3 – attached with the node Bethel to the rest of network.

From this example we see that in real-life networks edges with the largest overlap weight tend to be edges with relatively small degrees in their end nodes ( $o(e)=1$ implies $\deg(u)=\deg(v)=t(e)+1$ ) – the overlap weight does not satisfy the condition ld3. Because of this the overlap weight is not very useful for data analytic tasks in searching for important elements of a given network. We would like to emphasize here that there are many applications in which overlap weight proves to be useful and appropriate; we question only its appropriateness for determining the most overlaped edges. We will try to improve the overlap weight definition to better suit the data analytic goals.

2.3 Corrected overlap weight

We define a corrected overlap weight as

[TABLE]

By the definiton of $\mu$ for every $e\in{\cal E}$ it holds $t(e)\leq\mu$ . Since $M(e)-t(e)\geq 0$ also $\mu+M(e)-t(e)\geq\mu$ and therefore ld2, $0\leq o^{\prime}(e)\leq 1$ . $o^{\prime}(e)=0$ exactly when $t(e)=0$ , and $o^{\prime}(e)=1$ exactly when $\mu=M(e)=t(e)$ . For ld3, the corresponding maximal edge neighbors subgraph contains $T(0,\mu,0)$ . The end nodes of the edge $e$ are structurally equivalent.

To show that ld1 also holds let $\mathbf{G}^{(1)}(e)$ denote the edge neighbors subgraph of the edge $e$ . Let $f$ be the edge added to $\mathbf{G}^{(1)}(e)$ . We can assume that $\deg(u)\geq\deg(v)$ , $e=(u:v)$ . Therefore $M(e)=\deg(u)-1$ . We have to consider some cases:

a. $f\in{\cal E}(\mathbf{G}^{(1)}(e))$ : then $\mathbf{G}\cup f=\mathbf{G}$ and $o^{\prime}(e,\mathbf{G}\cup f)=o^{\prime}(e,\mathbf{G})$ .

b. $f\notin{\cal E}(\mathbf{G}^{(1)}(e))$ :

b1. $f=(u:t)$ : then $t\in N(v)\setminus T(e)\setminus e$ . It creates new triangle $(u,v,t)$ . We have $t^{\prime}(e)=t(e)+1$ and $M^{\prime}(e)=M(e)+1$ . We get

[TABLE]

b2. $f=(v:t)$ : then $t\in N(u)\setminus T(e)\setminus e$ . It creates new triangle $(u,v,t)$ . We have $t^{\prime}(e)=t(e)+1$ and $M^{\prime}(e)=M(e)$ . We get

[TABLE]

b3. $f=(t:w)$ and $t,w\in N(u)\cup N(v)\setminus\{u,v\}$ : No new triangle on $e$ is created. We have $t^{\prime}(e)=t(e)$ and $M^{\prime}(e)=M(e)$ . Therefore $o^{\prime}(e,\mathbf{G}\cup f)=o^{\prime}(e,\mathbf{G})$ .

The corrected overlap weight $o^{\prime}$ is a kind of local density measure, but it is primarly a substitutiability measure. To get a better local density measure we have to consider besides triangles also quadrilaterals (4-cycles).

2.4 US Airports 1997 links with the largest corrected overlap weight

For the US Airports 1997 network we get $\mu=80$ . For the corrected overlap weight the edge cut at level 0.5 is presented in Figure 5. Six links with the largest triangular weights are given in Table 1.

In Figure 6 all the neighbors of end nodes WB Hartsfield Atlanta and Charlotte/Douglas Intl of the link with the largest corrected overlap weight value are presented. They have 76 common (triangular) neighbors. The node WB Hartsfield Atlanta has 11 and the node Charlotte/Douglas Intl has 25 additional neighbors. Note (see Table 1) that there are some links with higher triangular weight, but also with much higher number of additional neighbors – therefore with smaller corrected overlap weights.

2.5 Comparisons

In Figure 7 the set $\{(o(e),o^{\prime}(e)):e\in{\cal E}\}$ is displayed for the US Airports 1997 network. For most edges it holds $o^{\prime}(e)\leq o(e)$ . It is easy to see that $o(e)<o^{\prime}(e)\Leftrightarrow\mu<m(e)$ . Edges with the overlap value $o(e)>0.8$ have the corrected overlap weight $o^{\prime}(e)<0.2$ .

In Figure 8 the sets $\{(m(e),o(e)):e\in{\cal E}\}$ and $\{(m(e),o^{\prime}(e)):e\in{\cal E}\}$ are displayed for the US Airports 1997 network. With increasing of $m(e)$ the corresponding overlap weight $o(e)$ is decreasing; and the corresponding corrected overlap weight $o^{\prime}(e)$ is also increasing.

We can observe similar tendencies if we compare both weights with respect to the number of triangles $t(e)$ (see Figure 9).

3 Clustering coefficient

3.1 Clustering coefficient

For a node $u\in{\cal V}$ in an undirected simple graph $\mathbf{G}=({\cal V},{\cal E})$ its (local) clustering coefficient (Wikipedia, 2018) is measuring a local density in the node $u$ and is defined as a proportion of the number of existing edges between $u$ ’s neighbors to the number of all possible edges between $u$ ’s neighbors

[TABLE]

where $E(u)=|{\cal E}(N(u))|$ . If $\deg(u)\leq 1$ then $cc(u)=0$ .

It is easy to see that

[TABLE]

where $S(u)=\{e(u:v):e\in{\cal E}\}$ is the star in node $u$ .

It holds $0\leq cc(u)\leq 1$ ; $cc(u)=1$ exactly when ${\cal E}(N(u))$ is isomorphic to $K_{\deg(u)}$ – a complete graph on $\deg(u)$ nodes. Therefore it seems that the clustering coefficient could be used to identify nodes with the densest neighborhoods.

The notion of clustering coefficient can be extended also to simple directed graphs (with loops).

3.2 US Airports with the largest clustering coefficient

Let us apply also the clustering coefficient to the US Airports 1997 network.

In Table 2 airports with the clustering coefficient equal to 1 and the degree at least 4 are listed. There are 28 additional such airports with a degree 3, and 38 with a degree 2.

Again we see that the clustering coefficient attains its largest value in nodes with relatively small degree. The probability that we get a complete subgraph on $N(u)$ is decreasing very fast with increasing of $\deg(u)$ . The clustering coefficient does not satisfy the condition ld3.

3.3 Corrected clustering coefficient

To get a corrected version of the clustering coefficient we proposed in Pajek (De Nooy et al., 2018) to replace $\deg(u)$ in the denominator with $\Delta=\max_{v\in{\cal V}}\deg(v)$ . In this paper we propose another solution – we replace $\deg(u)-1$ with $\mu$ :

[TABLE]

If $\deg(u)=0$ then $cc^{\prime}(u)=0$ . Note that, if $\Delta>0$ then $\mu<\Delta$ .

To verify the property ld1 we add to $\mathbf{G}(u)$ a new edge $f$ with its end nodes in $\mathbf{G}(u)$ . Then $E^{\prime}(u)=E(u)+1$ and $\deg^{\prime}(u)=\deg(u)$ . Therefore

[TABLE]

To show the property ld2, $0\leq cc^{\prime}(u)\leq 1$ , we have to consider two cases:

a.

$\deg(u)\geq\mu$ : then for $v\in N(u)$ we have $\deg_{N(u)}(v)\leq\mu$ and therefore

[TABLE]

b.

$\deg(u)<\mu$ : then $\deg(u)-1\leq\mu$ and therefore

[TABLE]

For the property ld3, the value $cc^{\prime}(u)=1$ is attained in the case a on a $\mu$ -core, and in the case b on $K_{\mu+1}$ .

3.4 US Airports nodes with the largest corrected clustering coefficient

In Table 3 US Airports with the largest corrected clustering coefficient are listed. The largest value 0.3739 is attained for Cleveland-Hopkins Intl airport. In Figure 10 the adjacency matrix of a subnetwork on its 45 neighbors is presented. The subnetwork is relatively complete. A small value of corrected clustering coefficient is due to relatively small $\deg=45$ with respect to $\mu=80$ .

3.5 Comparisons

In Figure 11 the set $\{(cc(e),cc^{\prime}(e)):e\in{\cal E}\}$ is displayed for the US Airports 1997 network. The correlation between both coefficients is very small. An important observation is that edges with the largest value of the clustering coefficient have relatively small values of the corrected clustering coefficient. We also see that the number of edges in a node’s neighborhood is almost functionally dependent on its degree.

From Figure 12 we see that the clustering coefficient is decreasing with the increasing degree. Nodes with large degree have small values of clustering coefficient. The values of corrected clustering coefficient are large for nodes of large degree.

4 Conclusions

In the paper we showed that two network measures, the overlap weight and clustering coefficient, are not suitable for the data analytic task of determining important elements in a given network. We proposed corrected versions of these two measures that give expected results.

Because $\mu\leq\Delta$ we can replace in the corrected measures $\mu$ with $\Delta$ . Its advantage is that it can be easier computed; but the corresponding corrected index is less ‘sensitive’.

An interesting task for future research is a comparision of the proposed measures with measures from graph drawing (Melançon an Sallaberry, 2008; Nocaj et al., 2015, 2016).

Acknowledgments

The computations were done combining Pajek (De Nooy et al., 2018) with short programs in Python and R (Batagelj, 2016).

This work is supported in part by the Slovenian Research Agency (research program P1-0294 and research projects J1-9187, and J7-8279) and by Russian Academic Excellence Project ’5-100’.

The paper is a detailed and extended version of the talk presented at the CMStatistics (ERCIM) 2015 Conference. The author’s attendance on the conference was partially supported by the COST Action IC1408 – CRoNoS.

Bibliography12

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Batagelj and Mrvar (2006) Batagelj, V., Mrvar, A. (2006), Pajek data sets: US Airports network: http://vlado.fmf.uni-lj.si/pub/networks/data/mix/US Air 97.net .
2Batagelj (2016) Batagelj, V. (2016), Corrected. https://github.com/bavla/corrected .
3De Nooy et al. (2018) De Nooy, W., Mrvar, A., Batagelj, V. (2018). Exploratory Social Network Analysis with Pajek; Revised and Expanded Edition for Updated Software. Structural Analysis in the Social Sciences, Cambridge University Press.
4Holland and Leinhardt (1971) Holland, P.W. and Leinhardt, S. (1971). Transitivity in structural models of small groups. Comparative Group Studies 2: 107–124.
5Melançon an Sallaberry (2008) Melançon, G. and Sallaberry, A. (2008). Edge Metrics for Visual Graph Analytics: A Comparative Study. 12th International Conference Information Visualisation, 610-615.
6Nocaj et al. (2015) Nocaj, A., Ortmann, M. and Brandes, U. (2015). Untangling the Hairballs of Multi-Centered, Small-World Online Social Media Networks. Journal of Graph Algorithms and Applications 19(2), 595-618.
7Nocaj et al. (2016) Nocaj, A., Ortmann, M. and Brandes, U. (2016). Adaptive Disentanglement Based on Local Clustering in Small-World Network Visualization. IEEE Transactions on Visualization and Computer Graphics 22 (6), 1662 - 1671.
8Onnela et al. (2007) Onnela, J.P., Saramaki, J., Hyvonen, J., Szabo, G., Lazer, D., Kaski, K., Kertesz, J., Barabasi, A.L. (2007). Structure and tie strengths in mobile communication networks. Proceedings of the National Academy of Sciences 104(18), 7332.