STRIVE: Scene Text Replacement In Videos
Vijay Kumar B G, Jeyasri Subramanian, Varnith Chordia, Eugene Bart,, Shaobo Fang, Kelly Guan, Raja Bala

TL;DR
This paper introduces a novel method for replacing scene text in videos by combining deep style transfer, learned photometric transformations, and a three-step process to ensure realistic, temporally consistent results across challenging conditions.
Contribution
It extends still image text replacement techniques to videos, addressing challenges like lighting, motion blur, and temporal consistency with a new three-step approach and datasets.
Findings
Realistic text transfer in synthetic and real videos
Competitive quantitative and qualitative performance
Faster inference speed than existing methods
Abstract
We propose replacing scene text in videos using deep style transfer and learned photometric transformations.Building on recent progress on still image text replacement,we present extensions that alter text while preserving the appearance and motion characteristics of the original video.Compared to the problem of still image text replacement,our method addresses additional challenges introduced by video, namely effects induced by changing lighting, motion blur, diverse variations in camera-object pose over time,and preservation of temporal consistency. We parse the problem into three steps. First, the text in all frames is normalized to a frontal pose using a spatio-temporal trans-former network. Second, the text is replaced in a single reference frame using a state-of-art still-image text replacement method. Finally, the new text is transferred from the reference to remaining frames…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
