Semantic Frame Aggregation-based Transformer for Live Video Comment Generation
Anam Fatima, Yi Yu, Janak Kapuriya, Julien Lalanne, Jainendra Shukla

TL;DR
This paper introduces SFAT, a novel transformer model that leverages semantic relevance weighting of video frames and multimodal knowledge to generate contextually appropriate live comments on video streams, supported by a new diverse English dataset.
Contribution
The paper presents a new Semantic Frame Aggregation-based Transformer (SFAT) that prioritizes relevant video frames and integrates multimodal knowledge for improved comment generation, along with a large diverse English dataset.
Findings
SFAT outperforms existing methods in comment quality.
The weighted frame aggregation improves contextual relevance.
The dataset enables better training and evaluation.
Abstract
Live commenting on video streams has surged in popularity on platforms like Twitch, enhancing viewer engagement through dynamic interactions. However, automatically generating contextually appropriate comments remains a challenging and exciting task. Video streams can contain a vast amount of data and extraneous content. Existing approaches tend to overlook an important aspect of prioritizing video frames that are most relevant to ongoing viewer interactions. This prioritization is crucial for producing contextually appropriate comments. To address this gap, we introduce a novel Semantic Frame Aggregation-based Transformer (SFAT) model for live video comment generation. This method not only leverages CLIP's visual-text multimodal knowledge to generate comments but also assigns weights to video frames based on their semantic relevance to ongoing viewer conversation. It employs an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Generative Adversarial Networks and Image Synthesis
