RISE-T2V: Rephrasing and Injecting Semantics with LLM for Expansive Text-to-Video Generation

Xiangjun Zhang; Litong Gong; Yinglin Zheng; Yansong Liu; Wentao Jiang; Mingyi Xu; Biao Wang; Tiezheng Ge; Ming Zeng

arXiv:2511.04317·cs.CV·November 7, 2025

RISE-T2V: Rephrasing and Injecting Semantics with LLM for Expansive Text-to-Video Generation

Xiangjun Zhang, Litong Gong, Yinglin Zheng, Yansong Liu, Wentao Jiang, Mingyi Xu, Biao Wang, Tiezheng Ge, Ming Zeng

PDF

Open Access

TL;DR

RISE-T2V introduces a unified approach combining prompt rephrasing and semantic extraction using LLMs to improve text-to-video generation quality and user intent alignment.

Contribution

The paper presents RISE-T2V, a novel framework that integrates prompt rephrasing with semantic feature extraction into a single step, enhancing T2V models' scalability and performance.

Findings

01

Significantly improves video quality with concise prompts.

02

Enhances alignment with user intent through prompt rephrasing.

03

Applicable to various pre-trained LLMs and diffusion models.

Abstract

Most text-to-video(T2V) diffusion models depend on pre-trained text encoders for semantic alignment, yet they often fail to maintain video quality when provided with concise prompts rather than well-designed ones. The primary issue lies in their limited textual semantics understanding. Moreover, these text encoders cannot rephrase prompts online to better align with user intentions, which limits both the scalability and usability of the models, To address these challenges, we introduce RISE-T2V, which uniquely integrates the processes of prompt rephrasing and semantic feature extraction into a single and seamless step instead of two separate steps. RISE-T2V is universal and can be applied to various pre-trained LLMs and video diffusion models(VDMs), significantly enhancing their capabilities for T2V tasks. We propose an innovative module called the Rephrasing Adapter, enabling diffusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Video Analysis and Summarization