Can you Trust the Trend: Discovering Simpson's Paradoxes in Social Data
Nazanin Alipourfard, Peter G. Fennell, Kristina Lerman

TL;DR
This paper introduces a statistical method to detect Simpson's paradox in social data, demonstrating its application on Stack Exchange to reveal insights about user behavior and answer acceptance.
Contribution
The paper presents a novel approach for automatically identifying Simpson's paradox in social datasets, with empirical validation on Stack Exchange data.
Findings
Confirmed a known Simpson's paradox in user answer acceptance.
Discovered several new instances of Simpson's paradox in social data.
Provided insights into user behavior patterns on Stack Exchange.
Abstract
We investigate how Simpson's paradox affects analysis of trends in social data. According to the paradox, the trends observed in data that has been aggregated over an entire population may be different from, and even opposite to, those of the underlying subgroups. Failure to take this effect into account can lead analysis to wrong conclusions. We present a statistical method to automatically identify Simpson's paradox in data by comparing statistical trends in the aggregate data to those in the disaggregated subgroups. We apply the approach to data from Stack Exchange, a popular question-answering platform, to analyze factors affecting answerer performance, specifically, the likelihood that an answer written by a user will be accepted by the asker as the best answer to his or her question. Our analysis confirms a known Simpson's paradox and identifies several new instances. These…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Expert finding and Q&A systems · Wikis in Education and Collaboration
