The Effects of Data Quality on the Analysis of Corporate Board Interlock Networks
Javier Garcia-Bernardo, Frank W. Takes

TL;DR
This paper investigates how data quality issues impact the analysis of large corporate board interlock networks, proposing methods to assess and improve data accuracy to ensure reliable network analysis results.
Contribution
It introduces a novel approach to automatically assess and enhance data quality in large-scale corporate networks, addressing missing and biased data issues.
Findings
Data quality significantly affects network topology and centrality measures.
The proposed methods improve the accuracy of network analysis results.
Restoring data quality leads to more reliable insights into corporate interlocks.
Abstract
Nowadays, social networks of ever increasing size are studied by researchers from a range of disciplines. The data underlying these networks is often automatically gathered from API's, websites or existing databases. As a result, the quality of this data is typically not manually validated, and the resulting networks may be based on false, biased or incomplete data. In this paper, we investigate the effect of data quality issues on the analysis of large networks. We focus on the global board interlock network, in which nodes represent firms across the globe, and edges model social ties between firms -- shared board members holding a position at both firms. First, we demonstrate how we can automatically assess the completeness of a large dataset of 160 million firms, in which data is missing not at random. Second, we present a novel method to increase the accuracy of the entries in our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
