TL;DR
This paper introduces a novel sentiment-oriented Transformer-based Variational Autoencoder network for live video commenting, enhancing comment diversity and sentiment control, and addressing data imbalance issues.
Contribution
It proposes a new model combining VAE and attention mechanisms to generate diverse, sentiment-aware live video comments, improving over existing methods.
Findings
Outperforms state-of-the-art in comment quality and diversity
Effectively handles data imbalance in live video datasets
Demonstrates strong results on Livebot and VideoIC datasets
Abstract
Automatic live video commenting is with increasing attention due to its significance in narration generation, topic explanation, etc. However, the diverse sentiment consideration of the generated comments is missing from the current methods. Sentimental factors are critical in interactive commenting, and lack of research so far. Thus, in this paper, we propose a Sentiment-oriented Transformer-based Variational Autoencoder (So-TVAE) network which consists of a sentiment-oriented diversity encoder module and a batch attention module, to achieve diverse video commenting with multiple sentiments and multiple semantics. Specifically, our sentiment-oriented diversity encoder elegantly combines VAE and random mask mechanism to achieve semantic diversity under sentiment guidance, which is then fused with cross-modal features to generate live video comments. Furthermore, a batch attention module…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
