Fine-grained length controllable video captioning with ordinal embeddings
Tomoya Nitta, Takumi Fukuzawa, Toru Tamaki

TL;DR
This paper introduces a novel fine-grained length control method for video captioning using ordinal and bit embeddings, enabling precise control over caption length and reading timing.
Contribution
It proposes two multi-hot vector based length embedding methods, including ordinal embedding, for improved length control in video captioning models.
Findings
Effective length control demonstrated on ActivityNet and Spoken Moments datasets.
Embedding vectors learned to separate length and semantic information.
Proposed methods outperform traditional linear embeddings in length regulation.
Abstract
This paper proposes a method for video captioning that controls the length of generated captions. Previous work on length control often had few levels for expressing length. In this study, we propose two methods of length embedding for fine-grained length control. A traditional embedding method is linear, using a one-hot vector and an embedding matrix. In this study, we propose methods that represent length in multi-hot vectors. One is bit embedding that expresses length in bit representation, and the other is ordinal embedding that uses the binary representation often used in ordinal regression. These length representations of multi-hot vectors are converted into length embedding by a nonlinear MLP. This method allows for not only the length control of caption sentences but also the control of the time when reading the caption. Experiments using ActivityNet Captions and Spoken Moments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Video Analysis and Summarization · Multimodal Machine Learning Applications
MethodsIndependent Component Analysis
