Predict, Don't React: Value-Based Safety Forecasting for LLM Streaming

Pride Kavumba; Koki Wataoka; Huy H. Nguyen; Jiaxuan Li; Masaya Ohagi

arXiv:2604.03962·cs.CL·April 7, 2026

Predict, Don't React: Value-Based Safety Forecasting for LLM Streaming

Pride Kavumba, Koki Wataoka, Huy H. Nguyen, Jiaxuan Li, Masaya Ohagi

PDF

TL;DR

This paper introduces StreamGuard, a forecasting-based streaming safety guardrail for LLMs that predicts future harmfulness to enable early intervention without needing exact boundary annotations.

Contribution

It proposes a unified, model-agnostic approach to streaming moderation using forecasting supervised by Monte Carlo rollouts, improving safety performance across benchmarks.

Findings

01

StreamGuard improves input moderation F1 from 86.7 to 88.2.

02

StreamGuard achieves 97.5 F1 on response moderation benchmark.

03

Forecasting supervision transfers effectively across models and tokenizers.

Abstract

In many practical LLM deployments, a single guardrail is used for both prompt and response moderation. Prompt moderation operates on fully observed text, whereas streaming response moderation requires safety decisions to be made over partial generations. Existing text-based streaming guardrails commonly frame this output-side problem as boundary detection, training models to identify the earliest prefix at which a response has already become unsafe. In this work, we introduce StreamGuard, a unified model-agnostic streaming guardrail that instead formulates moderation as a forecasting problem: given a partial prefix, the model predicts the expected harmfulness of likely future continuations. We supervise this prediction using Monte Carlo rollouts, which enables early intervention without requiring exact token-level boundary annotations. Across standard safety benchmarks, StreamGuard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.