Regional economic status inference from information flow and talent mobility
Jun Wang, Jian Gao, Jin-Hu Liu, Dan Yang, Tao Zhou

TL;DR
This study demonstrates that analyzing online social media relations and offline talent mobility networks can effectively predict regional economic status, with talent mobility data showing particularly strong predictive power.
Contribution
The paper introduces a novel approach combining social media and talent mobility networks to estimate regional economic status, highlighting the effectiveness of talent mobility data.
Findings
Talent mobility network predicts GDP with high accuracy.
Structural features of networks explain up to 84% of GDP variance.
Talent mobility data is a cost-effective indicator for socioeconomic analysis.
Abstract
Novel data has been leveraged to estimate socioeconomic status in a timely manner, however, direct comparison on the use of social relations and talent movements remains rare. In this letter, we estimate the regional economic status based on the structural features of the two networks. One is the online information flow network built on the following relations on social media, and the other is the offline talent mobility network built on the anonymized resume data of job seekers with higher education. We find that while the structural features of both networks are relevant to economic status, the talent mobility network in a relatively smaller size exhibits a stronger predictive power for the gross domestic product (GDP). In particular, a composite index of structural features can explain up to about 84% of the variance in GDP. The result suggests future socioeconomic studies to pay…
| Network | Resolution | # Regions | # Links | |
|---|---|---|---|---|
| OIF | Province | 31 | 961 | |
| City | 336 | 112,896 | ||
| OTM | Province | 31 | 818 | 347.7 |
| City | 287 | 9,746 | 29.18 |
| Variables | OLS Model | |||
|---|---|---|---|---|
| (1) | (2) | (3-1) | (3-2) | |
| 0.823∗∗∗ | 0.587∗∗∗ | 0.266∗∗∗ | ||
| 0.363∗∗∗ | ||||
| 0.217∗∗ | 0.300∗∗∗ | 0.051 | 0.128∗∗ | |
| 0.216∗∗∗ | 0.087∗ | |||
| 0.041 | 0.059∗∗ | |||
| 0.208∗∗∗ | 0.010 | 0.203∗∗∗ | 0.096 | |
| 0.291∗∗∗ | 0.013 | 0.192∗∗∗ | 0.010 | |
| 0.067∗∗ | 0.016 | |||
| 0.103 | 0.120∗ | |||
| Obs. | 290 | 280 | 280 | |
| Adj. | 0.762 | 0.802 | 0.832 | |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
11institutetext: Big Data Research Center, University of Electronic Science and Technology of China, Chengdu 611731, PRC
Institution of New Economic Development, Chengdu 610049, PRC
Social and economic systems Structures and organization in complex systems Dynamics of social systems
Regional economic status inference from information flow and talent mobility
Jun Wang 11
Jian Gao111E-mail: [email protected] 1122
Jin-Hu Liu 11
Dan Yang 11
Tao Zhou 11221122
Abstract
Novel data has been leveraged to estimate socioeconomic status in a timely manner, however, direct comparison on the use of social relations and talent movements remains rare. In this letter, we estimate the regional economic status based on the structural features of the two networks. One is the online information flow network built on the following relations on social media, and the other is the offline talent mobility network built on the anonymized resume data of job seekers with higher education. We find that while the structural features of both networks are relevant to economic status, the talent mobility network in a relatively smaller size exhibits a stronger predictive power for the gross domestic product (GDP). In particular, a composite index of structural features can explain up to about 84% of the variance in GDP. The result suggests future socioeconomic studies to pay more attention to the cost-effective talent mobility data.
pacs:
89.65.-s
pacs:
89.75.Fb
pacs:
87.23.Ge
1 Introduction
Timely estimation of social and economic status has important implications for addressing many development-related issues [1, 2, 3, 4], such as developing policies to reduce poverty [5], forecasting unemployment rate [6, 7], and optimizing strategies for economic diversification [8, 9]. Traditional socioeconomic status inference, however, usually follows a long-time delay due to the large consumption of resources in data collection. Thanks to the technological development, novel data sources are now increasingly available for estimating socioeconomic status [10, 11]. For example, Elvidge et al.[12] produced a global poverty map based on the brightness of night-time lights. Gao and Zhou [13] quantified regional economic complexity by analyzing firm data. Dong et al.[14] measured economic activity through mining mobile phone records. Liu et al.[15] inferred city-level economic status from online activities. Blumenstock et al.[16] predicted district-level wealth distribution based on mobile phone usages. Sobolevsky et al.[17] estimated individual socioeconomic status by analyzing bank card transactions. More related works are summarized in recent reviews [2, 18].
Among these works, two streams of literature are of particular interest. One stream focuses on relations between social network structure and economic status [4, 19]. For example, Eagle et al.[20] uncovered a strong correlation between social network diversity and socioeconomic indicators, Mao et al.[21] found that the ratio of in-going and out-going calls can predict a region’s income level, and Holzbauer et al.[22] showed that cross-state long ties on social media are strongly correlated with GDP in the US. Recently, Jahani et al.[23] uncovered a strong correlation between ego-network structural diversity and individual income, and Luo et al.[24] found that individuals’ influence in social network is predictive to their economic status. The other stream links human mobility pattern to socioeconomic status and outcomes [25, 27]. Individuals with different socioeconomic status have distinct mobility patterns [26, 28], and the movement of talents is critical to economic development [29]. For example, Frias-Martinez et al.[30] showed the predictive power of mobility patterns to socioeconomic status, Pappalardo et al.[31] found that movement diversity can well predict socioeconomic indicators, and Florez et al.[32] demonstrated that a group’s income increases with the diversity of commuting trips.
Most of previous works focus on either social network structures or human behavioral patterns [33]. Yet, the direct comparison between the predictive power of online social network structure and offline human mobility pattern to regional and individual socioeconomic status remain insufficient. One challenge that hinder studies towards this direction is the lack of large-scale and high-resolution online information and offline mobility data. Recently, the increasing availability of large-scale social and economic data with high spatial and temporal resolutions, such as mobile phone records [14], behavioral data [34], web-based ratings [35], public profiles [36], has made it possible to estimate socioeconomic status in a timely manner and with a relatively low cost [2, 9]. This provides us a chance to compare the capability of information flow and talent mobility on speculating economic status.
In this letter, we infer regional economic status from the following relations on social media and the talent movements recorded by anonymized resume data. We first build two directed and weighted networks, named online information flow network and offline talent mobility network. Then, we calculate several network structural features and link them to GDP. Results show that some features exhibit strong correlations with GDP such as the loops and outgoing spatial diversity of the information flow network and the out-strength and ingoing topological diversity of the information flow network. Overall, the talent mobility network features perform better in predicting economic status. After performing regression analysis for robustness checks, we further construct a composite index of both network structural features, which can explain up to about 84% of the variance in GDP.
2 Data and methods
In this section, we first introduce two large-scale online-crawled datasets for network construction, then present some measures to quantify network structures, and lastly introduce the methods applied for correlation and regression analyses.
2.1 Data description
The online information flow (OIF) network is built based on the public profiles and following relations among about 433 millon users of the China’s social network Weibo, which provides similar functions to Twitter. Specifically, from profiles we extract users’ locations covering 336 prefecture-level cities aggregated into 31 provinces (see Ref. [15] for details). Then, based on users’ following relations we build the OIF network among regions (cities or provinces dependent on the resolution) and represent it by a weighted adjacent matrix , whose element is the volume of information from region to , which is roughly estimated by the number of followings from region to . As users within the same region can follow each other, contains loops, i.e., in . Fig. 1A presents the visualization of the provincial-level OIF network and Table 1 summarizes basic statistics.
The offline talent mobility (OTM) network is built based on the self-reported resume data of about 142 thousand anonymized Chinese job seekers with higher education (see Ref. [36] for details). Specifically, we roughly estimate the flow of talents among regions based on the movements of job seekers from birth city to living city in career development and from living city to expected city in job hunting. The resume data covers 287 prefecture-level cities aggregated into 31 provinces. Notice that, some cities are isolated due to sparsity, and only cities remaining in the giant connected network are counted. The directed and weighted OTM network can also be represented by a weighted adjacent matrix , whose element is the number of talents moved from region to . Similarly, contains loops. Fig. 1D visualizes the provincial-level OTM network and Table 1 summarizes basic statistics.
Some macro economic data at the province and city levels are collected respectively from the official books entitled “China Statistical Yearbook (2017)” and “China City Statistical Yearbook (2017)” released by the National Bureau of Statistics of China. Due to the time-consuming statistics, these books provide data with one year-lag, namely, for the year 2016. We have successfully collected GDP of 31 provinces and 290 prefecture-level cities while failed for the rest 46 cities due to the missing data. The unit of GDP data is 10,000 RMB (about 1,500 USD).
2.2 Structural features
Considering a network with a weighted adjacency matrix , we first calculate three direct structural features, namely, , , and [37]. Specifically, for a region , sums the weights of outgoing links, sums the weights of ingoing links, and is the weight of the self-loop link. Then, we calculate three relative structural features, namely, , and . Specifically, measures the rates of local information/talent retention. measures information/talent drain, where and mean all information/talents are drained and kept, respectively. measures information/talent gain, where means new information/talents are gained and means previous information/talents are kept.
Moreover, we quantify diversity by calculating four network structural features: two topological diversity measures ( and ) and two spatial diversity measures ( and ) [20]. Specifically, the ingoing and outgoing topological diversity of a region is defined by the Shannon entropy associated with the information/talent flow into and out of the region, respectively. Formally, the outgoing topological diversity for region is given by
[TABLE]
where . The ingoing spatial diversity for region is calculated by normalizing using the number of involved regions. Mathematically,
[TABLE]
where is the out-degree of region . Analogously, the ingoing topological diversity for region is defined in the similar manner, by
[TABLE]
where . The ingoing spatial diversity for region is calculated by normalizing using the number of involved regions, as
[TABLE]
where is the in-degree of region .
2.3 Analytical methods
To exploit the relations between structural features and GDP, we perform both correlation analysis and regression analysis. The Pearson correlation coefficient is used to quantify the linear correlation between two variables. The value is in the range , from negative to positive correlation. The ordinary least squares (OLS) model is employed to regress GDP against structural features. The estimated equation is given by
[TABLE]
where the structural variables are in the logarithmic form expect for the diversity measures, are regression coefficients of variables, and is the error term.
3 Results
In this section, we first analyze correlations between simple structural features and GDP, then summarize correlations between diversity-related features and GDP, and finally perform some robustness checks using regression models, based on which a composite index is further constructed to explore the prediction accuracy.
3.1 Correlation between simple features and GDP
The visualizations of province-level online information flow (OIF) and offline talent mobility (OTM) networks are presented in Fig. 1A and 1D, in which the direct link weights are the numbers of followings and talents from origin to target provinces, respectively. For OIF, Fig. 1B and 1C (Left) present the relations between and at the province and city levels, respectively. We find that and are perfectly correlated with each other, as suggested by at both resolutions. In contrast, as shown in Fig. 1E and 1F (Left), the correlations between and for OTM are relatively weaker, suggesting the unbalance of talent flows into and out of regions.
The volume of information and talent flows can be relevant to a region’s economic status. For OIF, Fig. 1B and 1C (Middle) present the relations between and GDP at the province and city levels, respectively. We notice that exhibits a high correlation () with GDP. Fig. 1E and 1F (Middle) present the similar trend for OTM, while the correlations () are stronger at both resolutions. The ratio of ingoing and outgoing flows can also be linked to economic status. For OIF, Fig. 1B and 1C (Right) present the relations between and GDP, where we find negative correlations () at both resolutions. This suggests that developed regions spread information better. As presented by Fig. 1E and 1F (Right) for OTM, however, we find a positive correlation () only at the province level.
These results suggest that attractiveness for talents in fine-grained regions reflects economic status better. This observation may be originated from the inequality of regional economic development. For instance, China faces seriously unbalanced regional economic development, where more developed cities usually have talent gain, while less developed cities may have talent drain. This unbalanced talent mobility and economic development at the city level may result in the positive correlations. However, such correlation can be diminished at the aggregated province level as a province can have multiple cities with different social and economic status, and talents can move among cities located in the same province.
The strength of loops () in the OIF and OTM networks suggest the retention of local information and talents, respectively. For OIF, Fig. 2A and 2B present how is related to (Left) and (Right) at province and city levels, respectively. Similarly, Fig. 2E and 2F present the relations for OTM. Overall, we find that loops are perfectly correlated () with strengths. Further, we explore how information and talent retentions are linked to economic status by calculating correlations between GDP and three loop-related features, namely, , , and . For OIF, we find from Fig. 2C and 2D that GDP is positively correlated with all the three features, and exhibits the strongest correlation () at the province level. Similar results hold for OTM as shown in Fig. 2G and 2H, and has a high correlation () with GDP at both resolutions. These results suggest the predictive power of local information and talent retentions for regional economic status.
3.2 Correlation between diversity features and GDP
We explore relations between GDP and two diversity-related features, namely, spatial diversity () and topological diversity (). For OIF, Fig. 3A and 3B present how GDP is related to the outgoing () and ingoing () spatial diversities, respectively. We observe strong negative correlations for both cases, specifically, for and for . As shown in Fig. 3C and 3D, while similar observations hold for OTM, the correlations () are stronger. In particular, we notice that has a stronger correlation with GDP for both networks. Previous study based on the UK communications showed that social network spatial diversity is positively correlated with community-level development [20], however, our results based on both the OIT and OTM networks in China suggest spatial diversities as negative predictors of regional economic status.
The topological diversity is equal to the spatial diversity for OIF as it is fully connected. Thereby, only for OTM we present how GDP is related to topological diversities and in Fig. 3E and 3F, respectively. We find that the correlation between and GDP is significantly larger than the correlation between and GDP, showing that is a more relevant feature to economic status. In summary, we find that and of OIF and of OTM are negative predictors of GDP, while of OTM is positively correlated with GDP.
3.3 Regression analysis and composite index
The Pearson correlations between structural features and economic development (GDP) are summarized in Fig. 4. As shown in Fig. 4A for OIF, simple structural features except have strongly positive correlations with GDP, while diversity-related features exhibit strongly negative correlations. Moreover, network structural features are more relevant to economic status at the province level than at the city level. In particular, the most relevant features are loops and diversities at the province level as well as strengths and loops at the city level. As presented in Fig. 4B for OTM, the most relevant features are , and at both the province and city levels.
We further perform some robustness checks by employing the ordinary least squares (OLS) model to regress GDP against structural features at the city level. Table 2 summaries the regression results. As shown in columns (1) and (2), the model including the OIF and the OTM network structural features can explain up to 76.2% and 80.2% of the variance in GDP, respectively. In particular, we notice that and of OIF are respectively significantly positive and negative predictors of GDP, while only of OTM is a positive predictor of GDP. Including both network structural features in column (3), where column (3-1) and column (3-2) correspond respectively to OIF and OTM, the variance in GDP can be explained by up to 83.2%. We additionally find that loops are the features that can best explain the variance in GDP with the adjusted for OIF and for OTM. These results confirm that the OTM network structural features are more predictive to regional economic development.
Based on the regression analysis, we construct a composite index of network structural features for the best prediction of regional economic status. Specifically, the composite index is calculated through weighting structural features by their regression coefficients. Formally, the composite index for region is given by
[TABLE]
is the ten vectors of network structural features, and is the vector of corresponding regression coefficients as shown in Table 2. Specifically, and are for OIF, and and are for OTM. All network structural features are standardized by the -score [38] before constructing the composite index.
The correlations between the composite index and normalized GDP at the city level are presented in Fig. 5A and 5B for OIF and OTM, respectively. For both networks, we find that GDP is strongly and positively correlated with the composite index. In particular, the composite index of OTM exhibits a slightly larger correlation () with GDP than the one of OIF (). The composite index of OIF and OTM can explain 76.5% and 80.6% of the variance in GDP, respectively. These observations suggest strong predictive powers of information and talent flows for regional economic development. Further, we construct a composite index using the structural features of both networks. As shown in Fig. 5C, the composite index has the largest correlation () with GDP, and it can explain up to 83.8% the variance in GDP. The result shows that combining network features of information flow and talent mobility can enhance the performance of economic status inference.
4 Conclusion and discussions
In summary, we have explored the inference of regional economic status from the online information flow network and the offline talent mobility network. The former was built on the following relations among about 433 million social media users, and the latter was built on the self-reported resume data of over 142 thousand job seekers with higher education. After performing the correlation analysis, we found that strengths of both networks have strongly positive correlations with GDP, and the loop-related network features are the most relevant. Moreover, we uncovered the negative correlations between GDP and the spatial diversities for both networks, while the topological diversities of the talent mobility network are positively correlated with GDP. Interestingly, we found that the talent mobility network features exhibit a stronger predictive power for GDP although it covers only about 1/3000 people in comparison with the information flow network. This suggests a more cost-effective way to infer economic status by leveraging some relative small-scale offline talent mobility data
The correlations between GDP and the information flow network structural features diminish at the fine-grained resolution. In particular, we observed negative correlations between spatial diversities and GDP, which is different from the previous finding [20]. Whether this inconsistency is originated from the inequality and complexity of China’s regional development [13] remains an open issue. Through the regression analysis, we found that the significant predictors of GDP are out-strength, ratios of loops and spatial diversities of the information flow network as well as the out-strength, loops and outgoing topological diversities of the talent mobility network. Based on the regression results, we further constructed a composite index of both network structural features that can explain up to about 84% of the variance in GDP. The result suggests a way of improving economic status inference through combining different network information.
The presented results should be interpreted in the light of some limitations on the data and analytical methods, which ask for further explorations. The estimation of information flow was solely based on social media, where taking into account other information exchange channels such as online chats [39] and mobile communications [40, 41] would help. The resume data covers a relatively small sample, where adding other large-scale data from human resource services [36], academic publishers [42] and formal talent markets [43] will be an improvement. Recent available large-scale and high spatio-temporal data would advance studies on comparing the predictive power of different data sources on inferring socioeconomic status. Moreover, a limited number of structural features were considered, where many network ranking indicators [44] can also be considered. In addition, it would be interesting to apply some variant models to predict and validate regional and temporal change of GDP based on time-windowed past GDP and network data, and we leave this for future work when data are available. Keeping these aforementioned limitations in mind, we hope our work will spark further studies on economic status inference from the aspects of both information flow and talent mobility.
Acknowledgements.
The authors acknowledge Hao Chen, Jing-Yi Liao, Zhong-Zheng Peng, Zhi-Hai Rong and Jun-Ming Shao for helpful discussions and Rui-Tong Wang for processing the raw data files. This work was partially supported by the National Natural Science Foundation of China (Grant Nos. 61433014, 61603074, 61673086, and 61703074).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] \Name Schweitzer F., Fagiolo G., Sornette D., Vega-Redondo F., Vespignani A. White D. R. \REVIEW Science 3252009422.
- 2[2] \Name Einav L. Levin J. \REVIEW Science 34620141243089.
- 3[3] \Name Perc M. \REVIEW J. R. Soc. Interface 11201420140378.
- 4[4] \Name Zhang X., Shao S., Stanley H. E. Havlin S. \REVIEW EPL 108201458001.
- 5[5] \Name Birdsall N. Londoño J. L. \REVIEW Am. Econ. Rev.87199732.
- 6[6] \Name Llorente A., Garcia-Herranz M., Cebrian M. Moro E. \REVIEW P Lo S ONE 102015 e 0128692.
- 7[7] \Name Yuan J., Zhang Q.-M., Gao J., Zhang L., Wan X.-S., Yu X.-J. Zhou T. \REVIEW Physica A 4442016442.
- 8[8] \Name Alshamsi A., Pinheiro F. L. Hidalgo C. A. \REVIEW Nat. Commun.920181328.
