Visual Text Correction

Amir Mazaheri; Mubarak Shah

arXiv:1801.01967·cs.CV·January 1, 2019

Visual Text Correction

Amir Mazaheri, Mubarak Shah

PDF

1 Repo

TL;DR

This paper introduces the novel task of Visual Text Correction, aiming to identify and replace inaccurate words in video descriptions by leveraging semantic relationships between videos and text.

Contribution

It proposes a deep network model for detecting and correcting inaccuracies in textual video descriptions, and introduces a method to automatically create a large dataset for this task.

Findings

01

The proposed model effectively detects and corrects inaccurate words.

02

The dataset construction approach enables large-scale training and evaluation.

03

Results demonstrate the model's strong performance on the VTC task.

Abstract

Videos, images, and sentences are mediums that can express the same semantics. One can imagine a picture by reading a sentence or can describe a scene with some words. However, even small changes in a sentence can cause a significant semantic inconsistency with the corresponding video/image. For example, by changing the verb of a sentence, the meaning may drastically change. There have been many efforts to encode a video/sentence and decode it as a sentence/video. In this research, we study a new scenario in which both the sentence and the video are given, but the sentence is inaccurate. A semantic inconsistency between the sentence and the video or between the words of a sentence can result in an inaccurate description. This paper introduces a new problem, called Visual Text Correction (VTC), i.e., finding and replacing an inaccurate word in the textual description of a video. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amirmazaheri1990/Visual-Text-Correction
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.