TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale
Jun Wang, Ziyin Zhang, Rui Wang, Hang Yu, Peng Di, Rui Wang

TL;DR
TingIS is an enterprise system that uses multi-stage event linking, LLMs, and noise reduction to detect and attribute customer incidents in real-time from noisy, high-volume data.
Contribution
The paper introduces TingIS, a comprehensive system combining indexing, LLMs, and domain knowledge for real-time incident discovery at enterprise scale.
Findings
Handles over 2,000 messages per minute with 95% high-priority incident discovery.
Achieves a P90 alert latency of 3.5 minutes.
Outperforms baseline methods in accuracy and noise reduction.
Abstract
Real-time detection and mitigation of technical anomalies are critical for large-scale cloud-native services, where even minutes of downtime can result in massive financial losses and diminished user trust. While customer incidents serve as a vital signal for discovering risks missed by monitoring, extracting actionable intelligence from this data remains challenging due to extreme noise, high throughput, and semantic complexity of diverse business lines. In this paper, we present TingIS, an end-to-end system designed for enterprise-grade incident discovery. At the core of TingIS is a multi-stage event linking engine that synergizes efficient indexing techniques with Large Language Models (LLMs) to make informed decisions on event merging, enabling the stable extraction of actionable incidents from just a handful of diverse user descriptions. This engine is complemented by a cascaded…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
