Trustworthy AI: Safety, Bias, and Privacy -- A Survey

Xingli Fang; Jianwei Li; Varun Mulchandani; Jung-Eun Kim

arXiv:2502.10450·cs.CR·June 12, 2025

Trustworthy AI: Safety, Bias, and Privacy -- A Survey

Xingli Fang, Jianwei Li, Varun Mulchandani, Jung-Eun Kim

PDF

Open Access

TL;DR

This survey reviews current challenges and insights related to safety, bias, and privacy in AI systems, emphasizing issues that undermine trustworthiness and discussing recent research and experiments in these areas.

Contribution

It provides a comprehensive overview of safety, bias, and privacy concerns in AI, highlighting recent findings and perspectives to guide future trustworthy AI development.

Findings

01

Safety alignment strategies for large language models

02

Identification of spurious biases affecting model reliability

03

Analysis of membership inference attacks on neural networks

Abstract

The capabilities of artificial intelligence systems have been advancing to a great extent, but these systems still struggle with failure modes, vulnerabilities, and biases. In this paper, we study the current state of the field, and present promising insights and perspectives regarding concerns that challenge the trustworthiness of AI models. In particular, this paper investigates the issues regarding three thrusts: safety, privacy, and bias, which hurt models' trustworthiness. For safety, we discuss safety alignment in the context of large language models, preventing them from generating toxic or harmful content. For bias, we focus on spurious biases that can mislead a network. Lastly, for privacy, we cover membership inference attacks in deep neural networks. The discussions addressed in this paper reflect our own experiments and observations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI

MethodsFocus