Data-NoMAD: A Tool for Boosting Confidence in the Integrity of Social Science Survey Data
Sanford C. Gordon, Cyrus Samii, Zhihao Su

TL;DR
Data-NoMAD is a tool designed to verify the integrity of social science survey datasets by creating hashes at data collection that can be used to detect unauthorized modifications before data publication.
Contribution
It introduces a novel hashing method for datasets that enables researchers to authenticate data integrity throughout the research process.
Findings
Successfully creates and verifies dataset hashes for survey data.
Detects unauthorized data modifications such as deletions or alterations.
Integrates seamlessly with existing data sharing and replication workflows.
Abstract
To safeguard against data fabrication and enhance trust in quantitative social science, we present Data Non-Manipulation Authentication Digest (Data-NoMAD). Data-NoMAD is a tool that allows researchers to certify, and others to verify, that a dataset has not been inappropriately manipulated between the point of data collection and the point at which a replication archive is made publicly available. Data-NoMAD creates and stores a column hash digest of a raw dataset upon initial download from a survey platform (the current version works with Qualtrics and SurveyCTO), but before it is subject to appropriate manipulations such as anonymity-preserving redactions. Data-NoMAD can later be used to verify the integrity of a publicly archived dataset by identifying columns that have been deleted, added, or altered. Data-NoMAD complements existing efforts at ensuring research integrity and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Data Quality and Management · Big Data Technologies and Applications
