Lyric Video Analysis Using Text Detection and Tracking
Shota Sakaguchi, Jun Kato, Masataka Goto, and Seiichi Uchida

TL;DR
This paper presents a method for recognizing and tracking lyric words in lyric videos by applying scene text detection, matching, and trajectory analysis to facilitate automatic lyric video generation.
Contribution
It introduces a novel approach combining text detection, frame matching, and trajectory clustering to analyze lyric word motion in videos.
Findings
Effective tracking of lyric words despite distortions and movements
Successful clustering of lyric word trajectories
Foundation for automatic lyric video generation
Abstract
We attempt to recognize and track lyric words in lyric videos. Lyric video is a music video showing the lyric words of a song. The main characteristic of lyric videos is that the lyric words are shown at frames synchronously with the music. The difficulty of recognizing and tracking the lyric words is that (1) the words are often decorated and geometrically distorted and (2) the words move arbitrarily and drastically in the video frame. The purpose of this paper is to analyze the motion of the lyric words in lyric videos, as the first step of automatic lyric video generation. In order to analyze the motion of lyric words, we first apply a state-of-the-art scene text detector and recognizer to each video frame. Then, lyric-frame matching is performed to establish the optimal correspondence between lyric words and the frames. After fixing the motion trajectories of individual lyric words…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Video Analysis and Summarization · Handwritten Text Recognition Techniques
