Timed text extraction from Taiwanese Kua-\'a-h\`i TV series
Tzu-Hung Huang, Yun-En Tsai, Yun-Ning Hung, Chih-Wei Wu, I-Chieh Wei, Li Su

TL;DR
This paper presents an interactive system combining OCR correction and speech/music detection to efficiently extract vocal segments and lyrics from Taiwanese opera TV series, facilitating music information retrieval tasks.
Contribution
It introduces a novel two-step approach integrating OCR and SMAD for high-precision vocal segment identification in low-quality archival videos.
Findings
High-precision vocal segment detection achieved
Efficient extraction of lyrics and vocal segments
Supports MIR tasks like lyrics identification
Abstract
Taiwanese opera (Kua-\'a-h\`i), a major form of local theatrical tradition, underwent extensive television adaptation notably by pioneers like I\^unn L\=e-hua. These videos, while potentially valuable for in-depth studies of Taiwanese opera, often have low quality and require substantial manual effort during data preparation. To streamline this process, we developed an interactive system for real-time OCR correction and a two-step approach integrating OCR-driven segmentation with Speech and Music Activity Detection (SMAD) to efficiently identify vocal segments from archival episodes with high precision. The resulting dataset, consisting of vocal segments and corresponding lyrics, can potentially supports various MIR tasks such as lyrics identification and tune retrieval. Code is available at https://github.com/z-huang/ocr-subtitle-editor .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Authorship Attribution and Profiling · Theater, Performance, and Music History
