Should I disclose my dataset? Caveats between reproducibility and   individual data rights

Raysa M. Benatti; Camila M. L. Villarroel; Sandra Avila; Esther L.; Colombini; Fabiana C. Severi

arXiv:2211.00498·cs.CY·November 2, 2022

Should I disclose my dataset? Caveats between reproducibility and individual data rights

Raysa M. Benatti, Camila M. L. Villarroel, Sandra Avila, Esther L., Colombini, Fabiana C. Severi

PDF

Open Access

TL;DR

This paper discusses the ethical and legal challenges of disclosing court datasets in NLP research, balancing reproducibility with individual data rights, especially in sensitive cases like human rights violations.

Contribution

It provides a comprehensive analysis of legal and ethical considerations and offers guidelines for researchers on dataset disclosure in sensitive legal contexts.

Findings

01

Legal restrictions limit dataset sharing in sensitive cases

02

Guidelines help balance reproducibility with data privacy

03

Case study on gender discrimination highlights ethical issues

Abstract

Natural language processing techniques have helped domain experts solve legal problems. Digital availability of court documents increases possibilities for researchers, who can access them as a source for building datasets -- whose disclosure is aligned with good reproducibility practices in computational research. Large and digitized court systems, such as the Brazilian one, are prone to be explored in that sense. However, personal data protection laws impose restrictions on data exposure and state principles about which researchers should be mindful. Special caution must be taken in cases with human rights violations, such as gender discrimination, over which we elaborate as an example of interest. We present legal and ethical considerations on the issue, as well as guidelines for researchers dealing with this kind of data and deciding whether to disclose it.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI