RichSpace: Enriching Text-to-Video Prompt Space via Text Embedding   Interpolation

Yuefan Cao; Chengyue Gong; Xiaoyu Li; Yingyu Liang; Zhizhou Sha,; Zhenmei Shi; Zhao Song

arXiv:2501.09982·cs.CV·February 4, 2025

RichSpace: Enriching Text-to-Video Prompt Space via Text Embedding Interpolation

Yuefan Cao, Chengyue Gong, Xiaoyu Li, Yingyu Liang, Zhizhou Sha,, Zhenmei Shi, Zhao Song

PDF

Open Access

TL;DR

RichSpace introduces an interpolation-based method in text embedding space to improve text-to-video generation, enabling more accurate and complex video outputs by selecting optimal embeddings.

Contribution

The paper presents a novel interpolation technique in text embedding space and a simple algorithm for selecting optimal embeddings to enhance text-to-video generation.

Findings

01

Improved video generation with complex features.

02

Effective selection of embeddings via perpendicular foot and cosine similarity.

03

Enhanced control over generated video content.

Abstract

Text-to-video generation models have made impressive progress, but they still struggle with generating videos with complex features. This limitation often arises from the inability of the text encoder to produce accurate embeddings, which hinders the video generation model. In this work, we propose a novel approach to overcome this challenge by selecting the optimal text embedding through interpolation in the embedding space. We demonstrate that this method enables the video generation model to produce the desired videos. Additionally, we introduce a simple algorithm using perpendicular foot embeddings and cosine similarity to identify the optimal interpolation embedding. Our findings highlight the importance of accurate text embeddings and offer a pathway for improving text-to-video generation performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Video Analysis and Summarization · Handwritten Text Recognition Techniques