StreamSense: Streaming Social Task Detection with Selective Vision-Language Model Routing
Han Wang, Deyi Ji, Lanyun Zhu, Jiebo Luo, Roy Ka-Wei Lee

TL;DR
StreamSense is a real-time social signal detection system that combines a lightweight encoder with selective Vision-Language Model routing to improve accuracy and efficiency in streaming social media analysis.
Contribution
It introduces a novel selective routing mechanism that escalates difficult cases to a VLM, reducing latency and computational cost while maintaining high accuracy.
Findings
Outperforms VLM-only streaming in accuracy.
Reduces average latency and compute by selective escalation.
Effective in social tasks like sentiment analysis and hate detection.
Abstract
Live streaming platforms require real-time monitoring and reaction to social signals, utilizing partial and asynchronous evidence from video, text, and audio. We propose StreamSense, a streaming detector that couples a lightweight streaming encoder with selective routing to a Vision-Language Model (VLM) expert. StreamSense handles most timestamps with the lightweight streaming encoder, escalates hard/ambiguous cases to the VLM, and defers decisions when context is insufficient. The encoder is trained using (i) a cross-modal contrastive term to align visual/audio cues with textual signals, and (ii) an IoU-weighted loss that down-weights poorly overlapping target segments, mitigating label interference across segment boundaries. We evaluate StreamSense on multiple social streaming detection tasks (e.g., sentiment classification and hate content moderation), and the results show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Emotion and Mood Recognition · Multimodal Machine Learning Applications
