V-SAT: Video Subtitle Annotation Tool

Arpita Kundu; Joyita Chakraborty; Anindita Desarkar; Aritra Sen; Srushti Anil Patil; Vishwanathan Raman

arXiv:2510.24180·cs.LG·October 29, 2025

V-SAT: Video Subtitle Annotation Tool

Arpita Kundu, Joyita Chakraborty, Anindita Desarkar, Aritra Sen, Srushti Anil Patil, Vishwanathan Raman

PDF

TL;DR

V-SAT is a comprehensive, automated framework that improves subtitle quality by detecting and correcting various issues using advanced AI models, reducing manual editing and enhancing synchronization and accuracy.

Contribution

The paper introduces V-SAT, the first unified system combining LLMs, VLMs, image processing, and ASR for automatic, comprehensive subtitle correction and annotation.

Findings

01

SUBER score reduced from 9.6 to 3.54

02

F1-scores of ~0.80 for image mode issues

03

High human-in-the-loop validation quality

Abstract

The surge of audiovisual content on streaming platforms and social media has heightened the demand for accurate and accessible subtitles. However, existing subtitle generation methods primarily speech-based transcription or OCR-based extraction suffer from several shortcomings, including poor synchronization, incorrect or harmful text, inconsistent formatting, inappropriate reading speeds, and the inability to adapt to dynamic audio-visual contexts. Current approaches often address isolated issues, leaving post-editing as a labor-intensive and time-consuming process. In this paper, we introduce V-SAT (Video Subtitle Annotation Tool), a unified framework that automatically detects and corrects a wide range of subtitle quality issues. By combining Large Language Models(LLMs), Vision-Language Models (VLMs), Image Processing, and Automatic Speech Recognition (ASR), V-SAT leverages…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.