Cooking Up Risks: Benchmarking and Reducing Food Safety Risks in Large Language Models
Weidi Luo, Xiaofei Wen, Tenghao Huang, Hongyi Wang, Zhen Xiang, Chaowei Xiao, Kristina Gligori\'c, Muhao Chen

TL;DR
This paper introduces FoodGuardBench, a comprehensive benchmark for evaluating food safety in large language models, revealing vulnerabilities and proposing a specialized guardrail model to improve safety in food-related AI applications.
Contribution
The paper presents FoodGuardBench, the first domain-specific benchmark for food safety in LLMs, and introduces FoodGuard-4B, a tailored guardrail model to enhance safety and robustness.
Findings
Current LLMs show sparse safety alignment in food safety.
Existing guardrails often miss domain-specific malicious inputs.
FoodGuard-4B improves detection and safety in food-related queries.
Abstract
Large language models (LLMs) are increasingly deployed for everyday tasks, including food preparation and health-related guidance. However, food safety remains a high-stakes domain where inaccurate or misleading information can cause severe real-world harm. Despite these risks, current LLMs and safety guardrails lack rigorous alignment tailored to domain-specific food hazards. To address this gap, we introduce FoodGuardBench, the first comprehensive benchmark comprising 3,339 queries grounded in FDA guidelines, designed to evaluate the safety and robustness of LLMs. By constructing a taxonomy of food safety principles and employing representative jailbreak attacks (e.g., AutoDAN and PAP), we systematically evaluate existing LLMs and guardrails. Our evaluation results reveal three critical vulnerabilities: First, current LLMs exhibit sparse safety alignment in the food-related domain,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
