Valued Ties Tell Fewer Lies: Why Not To Dichotomize Network Edges With Thresholds
Andrew C. Thomas, Joseph K. Blitzstein

TL;DR
Dichotomizing valued network edges by thresholds often results in significant information loss and efficiency reduction, especially in large networks, questioning the common practice of threshold-based dichotomization.
Contribution
This paper critically examines the effects of threshold-based dichotomization of valued network edges, highlighting the substantial information and efficiency losses involved.
Findings
Threshold criteria produce wide variations in binary graphs
Dichotomization causes significant information loss in valued networks
Efficiency loss increases with network size in time series models
Abstract
In order to conduct analyses of networked systems where connections between individuals take on a range of values - counts, continuous strengths or ordinal rankings - a common technique is to dichotomize the data according to their positions with respect to a threshold value. However, there are two issues to consider: how the results of the analysis depend on the choice of threshold, and what role the presence of noise has on a system with respect to a fixed threshold value. We show that while there are principled criteria of keeping information from the valued graph in the dichotomized version, they produce such a wide range of binary graphs that only a fraction of the relevant information will be kept. Additionally, while dichotomization of predictors in linear models has a known asymptotic efficiency loss, the same process applied to network edges in a time series model will lead to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCrime, Illicit Activities, and Governance · Qualitative Comparative Analysis Research · Complex Network Analysis Techniques
