\'Etica para LLMs: o compartilhamento de dados sociolingu\'isticos
Marta Deysiane Alves Faria Sousa, Raquel Meister Ko. Freitag, T\'ulio, Sousa de Gois

TL;DR
This paper discusses the ethical issues in collecting and sharing sociolinguistic speech data for large language models, emphasizing data sensitivity and privacy concerns.
Contribution
It highlights ethical considerations and proposes strategies to handle sensitive sociolinguistic data in the context of LLM development.
Findings
Identifies ethical challenges in sociolinguistic data collection
Proposes strategies for privacy preservation in data sharing
Emphasizes importance of data quality and representativeness
Abstract
The collection of speech data carried out in Sociolinguistics has the potential to enhance large language models due to its quality and representativeness. In this paper, we examine the ethical considerations associated with the gathering and dissemination of such data. Additionally, we outline strategies for addressing the sensitivity of speech data, as it may facilitate the identification of informants who contributed with their speech.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicslinguistics and terminology studies · Natural Language Processing Techniques · Translation Studies and Practices
