A multi-language toolkit for the semi-automated checking of research outputs
Richard J. Preen, Maha Albashir, Simon Davy, Jim Smith

TL;DR
This paper introduces SACRO, an open-source toolkit that semi-automates privacy disclosure checks in research outputs, enhancing collaboration between researchers and data custodians in secure environments.
Contribution
It presents a multi-language, best-practice framework that supports human-in-the-loop privacy checks with automated identification and mitigation of disclosive outputs.
Findings
Supports Python, R, and Stata with front-end packages
Automates identification of disclosive outputs
Provides audit-friendly reports and decision tracking
Abstract
This article presents a free and open source toolkit that supports the semi-automated checking of research outputs (SACRO) for privacy disclosure within secure data environments. SACRO is a framework that applies best-practice principles-based statistical disclosure control (SDC) techniques on-the-fly as researchers conduct their analyses. SACRO is designed to assist human checkers rather than seeking to replace them as with current automated rules-based approaches. The toolkit is composed of a lightweight Python package that sits over well-known analysis tools that produce outputs such as tables, plots, and statistical models. This package adds functionality to (i) automatically identify potentially disclosive outputs against a range of commonly used disclosure tests; (ii) apply optional disclosure mitigation strategies as requested; (iii) report reasons for applying SDC; and (iv)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Data Quality and Management
