Contrastive Graph Multimodal Model for Text Classification in Videos

Ye Liu; Changchong Lu; Chen Lin; Di Yin; Bo Ren

arXiv:2206.02343·cs.CV·June 7, 2022·1 cites

Contrastive Graph Multimodal Model for Text Classification in Videos

Ye Liu, Changchong Lu, Chen Lin, Di Yin, Bo Ren

PDF

Open Access

TL;DR

This paper introduces a novel multimodal contrastive learning approach with a specialized CorrelationNet module for classifying texts in videos, addressing challenges like diverse layouts and limited labeled data.

Contribution

It pioneers the task of video text classification using multimodal fusion and contrastive learning, and proposes a new dataset for this purpose.

Findings

01

Effective in handling complex video text scenarios

02

Improves classification accuracy with contrastive learning

03

Demonstrates strong results on the TI-News dataset

Abstract

The extraction of text information in videos serves as a critical step towards semantic understanding of videos. It usually involved in two steps: (1) text recognition and (2) text classification. To localize texts in videos, we can resort to large numbers of text recognition methods based on OCR technology. However, to our knowledge, there is no existing work focused on the second step of video text classification, which will limit the guidance to downstream tasks such as video indexing and browsing. In this paper, we are the first to address this new task of video text classification by fusing multimodal information to deal with the challenging scenario where different types of video texts may be confused with various colors, unknown fonts and complex layouts. In addition, we tailor a specific module called CorrelationNet to reinforce feature representation by explicitly extracting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Handwritten Text Recognition Techniques · Multimodal Machine Learning Applications

MethodsContrastive Learning