Unsafe LLM-Based Search: Quantitative Analysis and Mitigation of Safety Risks in AI Web Search
Zeren Luo, Zifan Peng, Yule Liu, Zhen Sun, Mingchen Li, Jingyi Zheng, Xinlei He

TL;DR
This paper quantifies safety risks in AI-powered search engines using LLMs, revealing frequent harmful content and malicious URL citations, and proposes a defense mechanism to mitigate these risks effectively.
Contribution
It provides the first systematic safety risk assessment of production AIPSEs and introduces an agent-based defense with GPT-4.1 for risk mitigation.
Findings
AIPSEs often generate harmful content with malicious URLs even for benign queries.
Querying URLs increases risk responses; natural language queries slightly reduce risk.
The proposed GPT-4.1-based defense reduces risks with minimal information loss (~10.7%).
Abstract
Recent advancements in Large Language Models (LLMs) have significantly enhanced the capabilities of AI-Powered Search Engines (AIPSEs), offering precise and efficient responses by integrating external databases with pre-existing knowledge. However, we observe that these AIPSEs raise risks such as quoting malicious content or citing malicious websites, leading to harmful or unverified information dissemination. In this study, we conduct the first safety risk quantification on seven production AIPSEs by systematically defining the threat model, risk type, and evaluating responses to various query types. With data collected from PhishTank, ThreatBook, and LevelBlue, our findings reveal that AIPSEs frequently generate harmful content that contains malicious URLs even with benign queries (e.g., with benign keywords). We also observe that directly querying a URL will increase the number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Business Intelligence
MethodsAbsolute Position Encodings · Layer Normalization · Byte Pair Encoding · Label Smoothing · Softmax · Dropout · Dense Connections · Transformer · GPT-4 · Umbrella Reinforcement Learning
