Trust & Safety of LLMs and LLMs in Trust & Safety
Doohee You, Dan Chon

TL;DR
This systematic review explores the integration of Large Language Models into trust and safety applications, highlighting challenges, risks, and best practices for their responsible deployment in safeguarding digital environments.
Contribution
It provides a comprehensive synthesis of current research on LLMs in trust and safety, emphasizing emerging risks and practical solutions for responsible use.
Findings
Identifies key challenges like prompt injection and jailbreak attacks.
Summarizes best practices for deploying LLMs responsibly.
Highlights the potential of LLMs to enhance trust and safety.
Abstract
In recent years, Large Language Models (LLMs) have garnered considerable attention for their remarkable abilities in natural language processing tasks. However, their widespread adoption has raised concerns pertaining to trust and safety. This systematic review investigates the current research landscape on trust and safety in LLMs, with a particular focus on the novel application of LLMs within the field of Trust and Safety itself. We delve into the complexities of utilizing LLMs in domains where maintaining trust and safety is paramount, offering a consolidated perspective on this emerging trend.\ By synthesizing findings from various studies, we identify key challenges and potential solutions, aiming to benefit researchers and practitioners seeking to understand the nuanced interplay between LLMs and Trust and Safety. This review provides insights on best practices for using LLMs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRisk and Safety Analysis · Technology and Data Analysis · Safety Systems Engineering in Autonomy
MethodsSoftmax · Attention Is All You Need · Focus
