Anti-modular nature of partially bipartite networks makes them infra small-world
Aradhana Singh, Md. Izhar Ashraf, Sitabhra Sinha

TL;DR
This paper reveals that anti-modular partially bipartite networks exhibit unique structural properties, such as being infra small-world, with higher efficiency and lower clustering, differing fundamentally from traditional network paradigms.
Contribution
It demonstrates how anti-modularity leads to distinct spectral and structural features, including a delocalization transition and bimodal PEV distribution, expanding understanding of complex network organization.
Findings
Anti-modular networks are infra small-world with high efficiency.
Spectral analysis shows a delocalization transition in anti-modular networks.
Bimodal PEV distribution serves as a signature of anti-modularity.
Abstract
Strong inter-dependence in complex systems can manifest as partially bipartite networks characterized by interactions occurring primarily between distinct groups of nodes (identified as modules). In this paper, we show that the anti-modular character of such networks, e.g., those defined by the adjacent occurrence of alphabetic characters in corpora of natural language texts, can result in striking structural properties which place them outside the well-known regular/small-world/random network paradigm. Using an ensemble of model networks whose modularity can be tuned, we demonstrate that strong module size heterogeneity in anti-modular random networks imparts them with higher communication efficiency and lower clustering than their randomized counterparts, making them infra small-world. Passage to anti-modularity is associated with characteristic changes in spectral properties of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Opinion Dynamics and Social Influence · Theoretical and Computational Physics
Anti-modular nature of partially bipartite networks makes
them infra small-world
Aradhana Singh1, Md. Izhar Ashraf1,2 and Sitabhra Sinha1,3
1The Institute of Mathematical Sciences, CIT Campus, Taramani, Chennai 600113, India.
2 BS Abdur Rahman Crescent Institute of Science & Technology, Vandalur, Chennai 600048, India.
3Homi Bhabha National Institute, Anushaktinagar, Mumbai 400094, India.
Abstract
Strong inter-dependence in complex systems can manifest as partially bipartite networks characterized by interactions occurring primarily between distinct groups of nodes (identified as modules). In this paper, we show that the anti-modular character of such networks, e.g., those defined by the adjacent occurrence of alphabetic characters in corpora of natural language texts, can result in striking structural properties which place them outside the well-known regular/small-world/random network paradigm. Using an ensemble of model networks whose modularity can be tuned, we demonstrate that strong module size heterogeneity in anti-modular random networks imparts them with higher communication efficiency and lower clustering than their randomized counterparts, making them infra small-world. Passage to anti-modularity is associated with characteristic changes in spectral properties of the network, including a delocalization transition exhibited by the principal eigenvector (PEV) of the normalized Laplacian. This is accompanied by the emergence of prominent bimodality in the distribution of PEV components, which can function as a signature for identifying anti-modular organization in empirical networks.
Many complex systems that occur around us can be described in terms of a network of large number of interacting components Newman_book ; Barabasi_book ; structure_dynamics_book . The connection topology characterizing these interactions for different systems is one of the most important factors determining their properties. A dominant paradigm in this context is that of small-world (SW) networks Watts1998 ; Newman2000 ; Vespignani2018 spanning a wide range of possible topological structures that are distinguished by the global property of communication efficiency (measured by the harmonic mean of all pairwise distances between the constituent nodes) Latora2001 and the local property of clustering (quantified as the ratio of the numbers of connected node triads to potential triads) Newman2009 . Lattices or regular networks having low efficiency, as well as, high clustering and Erdös-Renyi (ER) random networks, which show high efficiency with low clustering, form the two well-understood extreme limits of the range of structures encompassed by this paradigm. Small-world properties have subsequently been reported for a broad range of empirical networks (see, e.g., Refs. Albert1999 ; Newman2001 ; Wagner2001 ; Bullmore2009 ). Indeed, SW networks are far more general than the context of interpolation between regular and random networks in which they were originally proposed Watts1998 . In particular, modular networks, characterized by the existence of subnetworks within which connection density is significantly higher than that for the entire system, have been shown to exhibit small-world properties Pan09 . It is therefore of interest to ask if there are networks which fall outside this paradigm, or more aptly, whether the class of small-world networks is itself part of an even more general framework for describing connection topologies.
A particular class of empirical networks that do not appear to fit into the regular/small-world/random (Rg/SW/Rn) spectrum are defined by the adjacent occurrence of characters in texts of different natural languages which use alphabetic writing systems note1 . Fig. 1 (a-b) shows that these networks are partially bipartite, comprising two clusters (consisting of vowels and consonants, respectively). Most links occur between these two distinct types of nodes and comparatively few connect nodes of the same type. We find that these networks have anti-modular character suggested by the block diagonal structure of their adjacency matrices A ( if nodes are connected, otherwise) with relatively sparsely populated diagonal blocks and dense off-diagonal ones. This is verified by the negative values of the index , a quantitative measure for network modularity Newman_spectra , for the empirical networks [Fig. 1 (c)]. The macroscopic properties of these anti-modular networks show co-occurrence of extremely low clustering (even lower than corresponding degree-preserved randomized networks) with high efficiency [Fig. 1 (d)]. It is worth noting that in the usual Rg/SW/Rn paradigm the lowest clustering and the highest efficiency that can be achieved correspond to those of ER random networks, which form one of the extreme ends of the small-world spectrum. However, the empirical networks with anti-modular character have even lower clustering and, in some cases, marginally higher efficiency, and are thus even “smaller” than the random graphs. We thus term them infra small-world.
In this paper, we have shown that in general, networks comprising two modules can be shown to have infra-small world character if the ratio of inter- to intra-module connection density is varied so as to make them anti-modular, with heterogeneity in module sizes making this behavior very prominent. In order to investigate the properties of such infra-SW networks in a more detailed and systematic manner, we consider an ensemble of model networks. The topological organization of the network connections can be tuned so as to change the mesoscopic structure from modular to random and then to anti-modular (without altering the average degree of the network) by gradually increasing the density of inter- to intra-modular connectivity Pan09 . In order to consider heterogeneity in the size of the modules, we consider that the nodes of each network are divided among two modules having sizes and , respectively. The size is randomly sampled from a Gaussian distribution with a mean of and whose sample standard deviation is a free parameter quantifying the size heterogeneity pathak . It is easy to show that the expected sizes of the two modules are . By specifying , , and estimated from empirical networks, we can construct corresponding model network ensembles which have similar mesoscopic properties note1 . Fig. 1 (c) shows that the model networks generated using parameters estimated from the orthographic networks of different languages can reproduce quantitatively the mesoscopic nature of the latter fairly accurately.
To demonstrate that the infra-small world nature is associated with anti-modular character of a network, we now characterize the model networks in terms of their principal macroscopic properties. Fig. 2 (a) and (b) show the variation of the global efficiency and the average clustering coefficient as the mesoscopic structure of the network is changed by varying . When the network is modular (), is lower and higher than the corresponding values for homogeneous random networks (), consistent with earlier observations that modular networks are small-world Pan09 . On increasing beyond , the network becomes anti-modular and approaches complete bipartivity as . This makes connected triads increasingly unlikely, thereby decreasing the clustering to values even lower than that of ER random networks. The efficiency of anti-modular networks, on the other hand, depend on the extent of heterogeneity in module sizes. When module sizes are similar (i.e., low ) is seen to decrease monotonically from the maximum reached for . However, for high , when module sizes are very different, the efficiency attains values even higher than that seen for the ER random networks. This can be connected with the emergence of a bimodal degree distribution in these networks with increasing [Fig. 2 (f)], with the smaller of the two modules comprising extremely high degree nodes. These act as hubs connecting the entire network via extremely short paths that pass through them, as indicated by the increased value of the maximum betweenness centrality for such systems [Fig. 2 (c)]. As mentioned earlier, these anti-modular networks having higher and lower compared to homogeneous random networks thus lie beyond the spectrum of SW networks. As it is known that higher efficiency enhances global synchronization Barahona2002 while high clustering hinders it McGraw2005 , such infra-SW networks have potential utility in applications where extremely rapid synchronization of activity over the entire system (even faster than in random networks) is required.
Further insight into the structural changes that the networks undergo as is increased can be obtained by considering their degree homophily, i.e., whether connected nodes have a similar number of links, measured by the Pearson correlation coefficient of degree between pairs of linked nodes, Assor_Newman . Fig. 2 (d) shows that when the networks are highly modular, they tend to be degree assortative (i.e., ), particularly when module size heterogeneity is strong. Indeed it is known that a network where high degree nodes belong to one module while those of lower degree belong to another, exhibits degree assortativity Newman_spectra . This is consistent with the bimodal degree distribution for modular networks shown in Fig. 2 (e), suggesting that the lower and higher peaks of the distribution correspond to the smaller and larger modules, respectively (see details below). When is increased beyond , making the networks anti-modular, they become disassortative (i.e., ) when module sizes are unequal. These also have bimodal degree distribution [Fig. 2 (f)], but with the lower (higher) peak now associated with the larger (smaller) module. The existence of disassortativity suggests that the anti-modular networks have star-like structures NewmanGirvan2003 , with each high degree node of the smaller module preferring to connect to a large number of low degree nodes in the bigger module.
For a network with two modules of unequal sizes and having overall mean degree , the average number of connections for the nodes in each of the two modules can be very different, viz., and , where is the intra-module connection probability. For strong module size heterogeneity, as the size of the networks become large (i.e., as ), while . From these expressions, it follows that in modular networks (), the larger of the two modules has a degree distribution centered about a value close to while the nodal degrees of the smaller module approach [math] as . On the other hand, for anti-modular networks, the average degree of the larger module decreases asymptotically to as the network becomes completely bipartite, while that of the nodes in the smaller module initially increases linearly with and eventually saturate to as . This suggests that for strong module size heterogeneity, the nodes in the smaller module will have degree , making them hubs. Thus, as the network organization changes from modular to anti-modular, its topological structure alters from being composed of two relatively weakly connected clusters, each having dense intra-connectivity, to one having a few hubs that tend to avoid connecting to each other. We note in passing that networks with bimodal degree distribution have been shown to be dynamically more stable Pan07 ; Brede_Sinha , as well as, robust with respect to breakdowns and attacks Tanizawa2005 .
We can estimate the average path length (which is inversely related to the global efficiency) of these networks as a function of their anti-modular character and module size heterogeneity. For , assuming that the local neighborhood of each node resembles a tree such that cycles do not play a prominent role in the calculation of path lengths, one can use the approximation . This yields
[TABLE]
which, further simplifies to in the limit (i.e., when the network becomes completely bipartite. It is easy to see that as is extremely small, the effective path length reduces to values lower than the equivalent ER network (), providing the basis for the infra-small world property of anti-modular networks. Note that, this implies that spreading processes will be much faster on partially bipartite networks than even random ones. This is important to consider, e.g., for epidemic propagation in livestock populations whose transport between farms and markets form a partially bipartite network Kiss2006 .
As mentioned earlier, strong module size heterogeneity in the anti-modular regime results in the formation of star-like structures with nodes of the smaller module acting as hubs. As many of the nodes in the larger module connect to the same set of hubs, we can cluster them into groups of nodes having identical neighborhoods [Fig. 3 (a)]. The occurrence of multiple nodes that have exactly the same neighbors is reflected in the degeneracy of the unity eigenvalues of the normalized symmetric Laplacian matrix ID-1/2AD*-1/2***, where I is identity matrix and D is a diagonal matrix with , i.e., the degree of node, for the network Chung2003 . This can be seen in the prominent peak at seen for sufficiently strong heterogeneity, e.g., curves corresponding to in Fig 3 (c). This contrasts with the spectral behavior of for modular networks (i.e., ) shown in Fig. 3 (b), as well as, with the semi-circle law expected for ER random networks Chung2003 . We note that when the module sizes are similar, eigenvalue distributions for of modular, as well as, anti-modular networks are platykurtic in nature (indicated by the excess kurtosis of the bulk of the distribution being ) as is also the case for ER random networks. However, heterogeneity in module sizes leads to very different behavior for networks with and . For modular networks, increasing above [math] initially raises the excess kurtosis of the distribution to [math]. However, further increase of heterogeneity results in the larger module to dominate the system properties and the semi-circle law is recovered for the bulk in the limit of large . However, for anti-modular networks, increasing makes the excess kurtosis positive, suggesting that the eigenvalue distribution becomes leptokurtic in the presence of strong module size heterogeneity.
Another important spectral characteristic of that is intimately related to the meso-level structural organization of the network is the relative size of the gaps occurring at the lower and upper ends of the eigenvalue spectrum, viz., and , respectively. As the network becomes strongly modular (), there is a corresponding decrease in the smallest non-zero eigenvalue of (in the limit , ). As a result, the lower spectral gap is seen to increase with decreasing [Fig. 3 (d)], which is associated with the appearance of distinct time-scales for the dynamics occurring at different scales in the network, viz., fast intra-modular and slower inter-modular processes Pan09 . Conversely, when the anti-modular character becomes more prominent as , the largest eigenvalue of approaches its maximum value () and the upper spectral gap is seen to increase [Fig. 3 (e)]. Such an association between the mesoscopic structural organization of the network and its spectral characteristics is also reflected in the corresponding gaps of the eigenvalue spectrum for the modularity matrix B [, where is the total number of connections] Newman_spectra . We observe that the spectral gaps for B are more sensitive to heterogeneity in module sizes than the corresponding quantities for . Large differences in the sizes of the two modules can mask the modular character of a network (for ) as the larger module dominates the system (indeed is seen to decrease with increasing heterogeneity). Hence, with increasing , we find that the upper spectral gap of B, which is linked to modularity, decreases [Fig. 3 (g)]. However, for , increasing will make the distinct identity of the nodes belonging to the two “modules” even more prominent in terms of their degree (the limiting case corresponding to star-like networks). As a result, the lower spectral gap, which contributes to information about the anti-modular character of the network, increases with [Fig. 3 (f)].
Focusing now on the properties of the eigenvectors of , we observe that the eigenmodes corresponding to and convey information about the two modules into which the network is partitioned. Thus, for modular networks (), the group to which each node belongs can be identified from the sign of the corresponding component of , the eigenvector associated with . On the other hand, for anti-modular networks (), this role is played by , the eigenvector corresponding to the largest eigenvalue . Specifically, the distribution of the eigenvector components shows a bimodal nature, which becomes more prominent as approaches [math] (for ) or diverges (for ) [see panels (c) and (a), respectively, of Fig. 4] note1 . For a homogeneous ER random network (), where such a partitioning is not possible, the distributions for both of these eigenvectors are unimodal [Fig. 4 (b)]. These observations suggest that we can identify the existence of anti-modular mesoscopic organization in a network by measuring the extent of bimodality in the distribution of components for . For this purpose, we calculate the Bimodality Coefficient , where is skewness of the distribution and is its excess kurtosis Pfister2013 . Fig. 4 (d) shows how model networks corresponding to different values of can be characterized in terms of s of and . Thus, modular networks () are characterized by strong bimodality in with , where corresponds to uniform distribution, while anti-modular networks () are seen when has strong bimodality.
The eigenvectors of also exhibit localization behavior associated with structural heterogeneities that can inform us about the outcome of diffusive processes on a network Nakao2010 ; Hata2017 . We quantify the localization of the th eigenvector by its inverse participation ratio, where are the components of the eigenvector Distint_EigVec_loc ; IPR_Phys_Lett . Complete delocalization is associated with the minimum value of , when all components have equal contribution. Conversely, the maximum value of is associated with extreme localization, obtained when an eigenvector has only a single component having a finite contribution. Fig. 4 (e-g) show that a modular network exhibits high values of IPR for eigenmodes at both the lower and higher ends of the eigenvalue spectrum. On the other hand, anti-modular networks show delocalization in the principal eigenmodes while having strong localization in the central modes [Fig. 4(h-j)]. Localization in both types of networks becomes more prominent with increasing module size heterogeneity . The very different nature of localization behavior in modular and anti-modular networks is reflected in the localization-delocalization transition seen for the principal eigenmode (associated with the largest eigenvalue of ) as is varied [Fig 4 (k)]. Thus, as the mesoscopic nature of the network changes from modular to anti-modular, we observe that the eigenmode becomes completely delocalized ( as diverges), irrespective of the extent of heterogeneity in module sizes (similar to transitions seen in the spectral behavior of network adjacency matrices Slanina2017 ).
To conclude, we have shown that networks having a partially bipartite structure exhibit properties that place them outside the well-known Rg/SW/Rn range of network structures. In particular, when the sizes of the two partitions into which the nodes are grouped are very unequal, the network has a communication efficiency higher than that of homogeneous ER random networks, and correspondingly lower clustering. Such infra-SW property is related to the anti-modular character of these networks which we demonstrate by analyzing an ensemble of model networks whose mesoscopic nature can be systematically varied. Our work also suggests signatures, such as BC for the principal eigenvector of the corresponding Laplacian, to identify potential anti-modular organization in a wide range of empirical networks. We observe that for strong module size heterogeneity, the degree distribution of anti-modular networks becomes bimodal, which can make such network robust against a variety of perturbations. As anti-modular structure has been reported in several empirical situations, such as, for networks representing the adjacent occurrence of different parts of speech Newman_spectra , bilateral investment agreements between nations Saban2010 , romantic online interactions Holme2003 , food webs Townsend1998 ; Estrada2005 and that of farms and markets connected by movement of livestock Kiss2006 , it is important to understand whether such an organization appears because of functional considerations. Understanding how relative contributions of intra- and inter-dependence in networks comprising multiple partitions can impact, for instance, their robustness Singh2019 , will be a challenging problem for the future.
Acknowledgements.
This work was supported in part by IMSc Project of Interdisciplinary Science & Modeling (PRISM) and IMSc Complex Systems Project (XII Plan) funded by the Department of Atomic Energy, Government of India. We would like to thank Shakti Menon and K Chandrashekar for helpful discussions. The simulations required for this work were done in the Nandadevi cluster of the IMSc HPC facility.
Supplementary Material
for
Anti-modular nature of partially bipartite networks makes them infra small-world
Aradhana Singh1, Md. Izhar Ashraf1,2 and Sitabhra Sinha1,3
1*The Institute of Mathematical Sciences, CIT Campus, Taramani, Chennai 600113, India.
2 BS Abdur Rahman Crescent Institute of Science & Technology, Vandalur, Chennai 600048, India.
3Homi Bhabha National Institute, Anushaktinagar, Mumbai 400094, India.
I Data Description
For construction of the empirical networks defined by the adjacent occurrence of characters in texts of different natural languages that are written using alphabetic systems (shown in Figure 1 of the main text), we have used several different sources that are described below.
I.A Phoneme Network:
In order to construct the network connecting adjacent phonemes [Figure 1(a)] that occur in English words we have used a subset of 5321 words from a lemmatized list of 6318 frequently used words (i.e., with occurrences) from the British National Corpus (https://www.kilgarriff.co.uk/BNClists/lemma.num, accessed: 26th May 2016) which were phonetically transcribed using the open-source Carnegie Mellon University Pronouncing Dictionary (http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/sphinxdict/cmudict_SPHINX_40, accessed: 4th January 2018). The phonetic output is given in terms of the ARPAbet phoneme set (http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/sphinxdict/SphinxPhones_40, accessed: 4th January 2018). The ARPAbet is a standardized set of 39 phonemes used for describing the pronunciation of words in different languages. The phonetic transcription is subsequently mapped to the International Phonetic Alphabet (IPA) notation using the mapping provided in https://en.wikipedia.org/wiki/ARPABET. The nodes of the network shown in Fig. 1 (a) are labeled using the IPA symbols.
I.B Orthographic Networks:
For the networks described by the adjacent occurrence of alphabetic characters in words written in different natural languages, we have used the following language corpora:
Arabic: We have used a database of 14867 unique words of Classical (or Quranic) Arabic, a Semitic language which was originally written using a consonantal alphabet (also known as an ‘abjad’). The present alphabet, considered an “impure abjad”, comprises 27 signs representing consonantal sounds (including a modifier and a glottal stop) and 9 signs that represent long vowels (3), as well as, combinations of long vowels with diacritical marks (3), diphthong (1) and glottal stop (2). The database is created by selecting all words written using at least two alphabetic characters from Tanzil, an international project started in 2007 to produce a standard Unicode text for the Qur’an (http://tanzil.net/download/, accessed: 25th March 2015).
Dutch: We have used unique non-hyphenated words having two or more characters from a list of the 10000 most commonly used words in Dutch, a member of the Germanic branch of the Indo-European language family. The data has been collected from the Wortschatz website maintained by the University of Leipzig (http://wortschatz.uni-leipzig.de/Papers/top10000nl.txt, accessed: 22nd May 2015). The Dutch signary consists of 31 distinct alphabetic characters comprising 21 consonants, 5 vowels, 3 vowels with diacritical marks (acute accents or diaeresis), the digraph ‘ij’ that is considered as a letter in the Dutch language and an extra letter from the German alphabet (the Eszett).
English: We have used the Mieliestronk list of 58109 distinct words (comprising two or more letters) in English - belonging to the Germanic branch of the Indo-European language family - that has been compiled by merging several different word-lists (http://www.mieliestronk.com/wordlist.html, accessed: 4th December 2011). The English signary is made up of 26 lower case letters of the English alphabet, comprising 5 vowels and 21 consonants. The list we have considered excludes spellings that are considered to be non-British. A hyphenated word is listed in unhyphenated form by removing the punctuation mark. The list contains singular and plural forms of several words, as well as, multiword phrases that are in common usage rendered as a single word.
Finnish: A list of the 10000 most commonly used words (all of which use two or more letters) in the Finnish language, belonging to the Finnic branch of the Uralic language family, has been used. The data, obtained from the Wikiverb website, has been collected from newsgroup discussions, press and modern literature (http://wiki.verbix.com/Documents/WordfrequencyFi, accessed: 24th June 2015). The Finnish signary has 25 distinct signs - i.e., all vowels and consonants of the modern Latin alphabet along with two additional vowels “ä” and “ö”, excepting “q”,“x” and “w”.
French: We have chosen 9189 unique words that are written using two or more alphabetic characters from a list of the 10000 most commonly used words in French, a Romance language belonging to the Indo-European family. The data has been collected from the Wortschatz website maintained by the University of Leipzig (http://wortschatz.uni-leipzig.de/Papers/top10000fr.txt, accessed: May 22nd 2015). The French signary has 30 distinct alphabetic characters comprising 26 letters of the Latin alphabet along with 3 vowels with diacritical marks (acute accents or diaeresis) and an apostrophe sign.
German: We have chosen 9152 distinct words that are represented using two or more alphabetic characters from a list of the 9172 most commonly used words in German, a member of the Germanic branch of the Indo-European language family. The data has been collected from the Wortschatz website maintained by the University of Leipzig (http://wortschatz.uni-leipzig.de/Papers/top10000de.txt, accessed: May 22nd 2015). The German signary has 32 distinct alphabetic characters comprising the 26 letters of the Latin alphabet along with 4 vowels having diacritical marks (umlauts or acute accents), a ligature (the Eszett or scharfes S) and an apostrophe sign.
Hausa (Boko): We have used a list of 7062 unique words that are written using two or more alphabetic characters, obtained from a Hausa online dictionary maintained by the University of Vienna (http://www.univie.ac.at/Hausa/KamusTDC/CD-ROMHausa/KamusTDC/ARBEIT2.txt, accessed: 19th May, 2015). The Hausa signary has 30 distinct alphabetic characters comprising 23 letters from the Latin alphabet, four additional signs representing glottalized consonants, two digraphs (‘sh’ and ‘ts’) and an apostrophe sign.
Malay (Rumi): We have chosen 9970 unique words that are written using two or more alphabetic characters from a list of 10000 most commonly used words in Malay, a member of the Austronesian language family. All the words are written in Rumi or Latin script, which is the most commonly used form for writing Malay at present, although a modified Arabic script (Jawi) also exists. The data has been collected from the list of high frequency words that are publicly available at Invoke IT Blog (https://invokeit.wordpress.com/frequency-word-lists/, accessed: 4th January, 2014). The signary comprises the 26 letters of the Latin alphabet.
Persian: We have used a list of 10000 most commonly used words (each represented using two or more characters) in Persian, a member of the Indo-Iranian branch of the Indo-European language family, which is written using a modified form of the consonantal Arabic alphabet (an ‘abjad’). The words are obtained from a list of high-frequency words compiled using the Tehran University for Persian Language corpus and available at Invoke IT Blog (https://invokeit.wordpress.com/frequency-word-lists/, accessed: 4th January 2014). The signary comprises 40 signs, viz., 32 consonantal signs, a long vowel indicator (‘alef madde’), a ligature (‘lām alef’), a diacritic (‘hamze’), 3 consonants with the ‘hamze’ diacritical mark and different forms for the consonants ‘kâf’ and ‘ye’ when they occur in final position.
Russian: We have used a list of 9011 distinct words that are written using two or more alphabetic characters in Russian, a member of the Slavic branch of the Indo-European language family, written using a Cyrillic alphabet. The data has been collected from Russian Learners’ Dictionary: 10,000 words in frequency order compiled by Nicholas J Brown (Routledge, London, 1996), after removing all words that use characters which are not included in the standard Russian alphabet. (https://docs.google.com/spreadsheets/d/1hSsPR0fN7I456-TZOUFJwOb7GjSrqeoOo02hMCy9NfI/edit?pli1̄#gid7̄, accessed: 18th May 2015), The signary comprises the 33 letters of the modern Russian alphabet, comprising 10 vowels, 21 consonants and 2 signs that indicate pronunciation.
Spanish: We have used a list of 4902 distinct high-frequency words (that are written using two or more alphabetic characters) in Spanish, a Romance language belonging to the Indo-European family. The data has been collected from A Frequency Dictionary of Spanish compiled by Mark Davies (Routledge, New York, 2006). The Spanish signary uses 35 distinct alphabetic characters comprising 26 letters of the basic Latin alphabet along with an additional character ñ and two digraphs (‘ch’ and ‘ll’), as well as, vowels with diacritical marks (acute accents or diaeresis).
Turkish: We have used a list of 9909 distinct high-frequency words (that are written using two or more alphabetic characters) in Turkish, a member of the Turkic language family. The data has been collected from a Wiktionary word frequency list (https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists/Turkish_WordList_10K, accessed: 14th July 2015). The signary used has 32 letters, comprising 29 letters of the Turkish alphabet and 3 vowels used in conjunction with circumflex accents.
Urdu: We have used a database of 4998 unique words (that are represented using two or more characters) in Urdu, an Indo-Aryan language belonging to the Indo-European family, that is written using an extended Persian alphabet. The words are obtained from a list of frequently used words maintained by the Center for Language Engineering at Lahore (http://www.cle.org.pk/software/ling_resources/UrduHighFreqWords.htm, accessed: 1st January 2014). The signary comprises 46 signs, viz., 35 consonantal signs and 11 signs that represent long vowels (4), vowels with diacritics (2), vowels used in conjunction with a glottal stop (2), a diphthong (1) and two additional signs used for writing certain loan-words (2).
II Construction of Adjacency Matrix from Empirical Data
In order to construct the networks representing adjacent occurrence of graphemes in written texts, we have considered distinct phonemes (for phoneme network) or alphabetic signs (for orthographic network) as the nodes of the network. Connections between two nodes are made based on statistically significant co-occurrence of the two graphemes, corresponding to the two nodes, in adjacent positions in words included in the corpus under consideration. For example, consider two graphemes and that occur in a particular corpus. Let be the number of times they are found in adjacent positions in the words that occur in the database. We need to compare this with the frequency of co-occurrence entirely by chance. This is computed from a random surrogate of the database, which is constructed by randomly permuting the graphemes in every word of the original database. From such realizations of random surrogates, we obtain the mean and standard deviation of the frequency with which appears next to in a word simply as a chance outcome of their respective total frequencies of occurrence in the entire database. For the databases considered here, we have used . Thus, we can define a measure of the statistical significance of the empirical frequency as
[TABLE]
If for any pair of graphemes, it suggests a possible significant association between them as they co-occur more than what is expected by chance. Therefore, by assigning a link between two nodes and whenever the -score for the pair of graphemes and associated with these is positive, we can define a network represented by the adjacency matrix A, where if and , otherwise (Fig. S1). We note that, in general, , as the frequencies of adjacent occurrence of two graphemes are different depending on the order in which they occur in words, i.e., .
In the main text, the adjacency matrices constructed using the above procedure for Arabic, Dutch and Finnish have been shown [Fig. 1(b)]. Fig. S2 shows the adjacency matrices for ten other languages. Table S1 provides detailed information about each of these orthographic networks. Apart from mentioning the total number of nodes (corresponding to different graphemes) and the number of vowels and consonants (or, rather non-vowels) which provide the sizes of the two partitions into which the nodes are divided, the different columns indicate the size of largest connected component (i.e., the set of nodes for which a directed path exists from any node to any other node), the average number of connections per node, the overall connection density as well as the density within the two compartments and between two compartments, and network metrics such as the average clustering coefficient, the communication efficiency, the modularity index and the assortativity coefficient.
To show the infra-modular nature of the anti-modular networks, we have compared their global properties, specifically, their average clustering coefficient and communication efficiency , with the corresponding quantities and of the randomized network counterparts that have the same degree sequence as the anti-modular networks. Fig. S3 shows how these two network metrics vary (relative to those of randomized networks) as we change the mesoscopic nature of the model networks from modular to anti-modular. This is done by systematically increasing the ratio of inter- to intra-modular connection density, , from values less than (when the network is modular) to values greater than (when the network becomes anti-modular). We have also shown the effect of module size heterogeneity by contrasting the situation where the module sizes are same with one where they are different. To quantify the heterogeneity we have used the ratio of the size of the larger to the smaller partition. If be the total number of nodes and is the number of nodes in the larger partition, then this ratio corresponds to . The two situations we consider are (i.e., where the module sizes are same) and . As can be seen from Fig. S3, anti-modular networks, particularly in the presence of appreciable module size heterogeneity, exhibit higher communication efficiency and lower clustering than their randomized network counterparts.
We have also investigated the spectral properties of the networks (which are directed, in general), focusing on the normalized symmetric Laplacian matrix defined for the strongly connected component of the network as follows:
[TABLE]
where I is the identity matrix, P = D-1A** is the matrix of transition probabilities, A is the adjacency matrix, D is the diagonal matrix of out-degree (i.e., number of connections of a node directed outward from it) and is the Perron-Frobenius eigenvector of P. For a strongly connected network defined by A (and provided it is aperiodic), the distribution of random walkers on the network will converge to the stationary distribution given by (see F. Chung, Ann. Comb. 9, 1 (2005) doi:10.1007/s00026-005-0237-z).
The distributions of the leading eigenvector and the eigenector corresponding to the smallest finite eigenvalue of the normalized symmetric Laplacian for the model networks are shown in Fig. S4, as the mesoscopic nature of the network is varied by increasing . In the main text, these distributions have been suggested as providing signature for (anti-)modular organization in a network. As can be seen, when , such that the networks are modular in nature, the leading eigenvector has a unimodal distribution while exhibits a bimodal distribution. The reverse is observed for , i.e., when the networks are anti-modular.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1) M. E. J. Newman, Networks: An Introduction (Oxford University Press, Oxford, 2010). doi: 10.1093/acprof:oso/9780199206650.001.0001 · doi ↗
- 2(2) A.-L. Barabási and M. Pósfai, Network Science (Cambridge University Press, Cambridge, 2016).
- 3(3) M. Newman, A.-L. Barabási, and D. J. Watts (Eds.), The Structure and Dynamics of Networks (Princeton University Press, Princeton NJ, 2006).
- 4(4) D. J. Watts and S. H. Strogatz, Nature (Lond.) 393 , 440 (1998). doi: 10.1038/30918 · doi ↗
- 5(5) M. E. J. Newman, J. Stat. Phys. 101 , 819 (2000). doi: 10.1023/A:1026485807148 · doi ↗
- 6(6) A. Vespignani, Nature (Lond.) 558 , 528 (2018). doi: 10.1038/d 41586-018-05444-y · doi ↗
- 7(7) V. Latora and M. Marchiori, Phys. Rev. Lett. 87 , 198701 (2001). doi: 10.1103/Phys Rev Lett.87.198701 · doi ↗
- 8(8) M. E. J. Newman, Phys. Rev. Lett. 103 , 058701 (2009). doi: 10.1103/Phys Rev Lett.103.058701 · doi ↗
