ComVi: Context-Aware Optimized Comment Display in Video Playback

Minsun Kim; Dawon Lee; Junyong Noh

arXiv:2603.26173·cs.MM·March 30, 2026

ComVi: Context-Aware Optimized Comment Display in Video Playback

Minsun Kim, Dawon Lee, Junyong Noh

PDF

TL;DR

ComVi is a system that enhances video viewing by displaying time-synchronized comments based on audio-visual correlation, improving engagement and reducing spoilers.

Contribution

It introduces a novel method to align comments with relevant video moments using audio-visual correlation and optimization for better viewer experience.

Findings

01

ComVi increased user engagement compared to traditional interfaces.

02

71.9% of participants preferred ComVi over YouTube and Danmaku.

03

The system effectively reduces spoilers by contextually displaying comments.

Abstract

On general video-sharing platforms like YouTube, comments are displayed independently of video playback. As viewers often read comments while watching a video, they may encounter ones referring to moments unrelated to the current scene, which can reveal spoilers and disrupt immersion. To address this problem, we present ComVi, a novel system that displays comments at contextually relevant moments, enabling viewers to see time-synchronized comments and video content together. We first map all comments to relevant video timestamps by computing audio-visual correlation, then construct the comment sequence through an optimization that considers temporal relevance, popularity (number of likes), and display duration for comfortable reading. In a user study, ComVi provided a significantly more engaging experience than conventional video interfaces (i.e., YouTube and Danmaku), with 71.9% of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.