CONSENT: Context Sensitive Transformer for Bold Words Classification

Ionut-Catalin Sandu; Daniel Voinea; Alin-Ionut Popa

arXiv:2205.07683·cs.CV·May 17, 2022·1 cites

CONSENT: Context Sensitive Transformer for Bold Words Classification

Ionut-Catalin Sandu, Daniel Voinea, Alin-Ionut Popa

PDF

Open Access

TL;DR

CONSENT introduces a context-sensitive transformer framework that effectively classifies bold words in images and can be extended to other tasks like determining game winners from image sequences, achieving state-of-the-art results.

Contribution

The paper proposes a novel transformer-based framework for context-dependent object classification, demonstrating its effectiveness on bold word detection and game outcome prediction.

Findings

01

Achieved state-of-the-art results in bold words detection.

02

Competitive performance in rock-paper-scissors game classification.

03

Framework is extensible to different visual classification tasks.

Abstract

We present CONSENT, a simple yet effective CONtext SENsitive Transformer framework for context-dependent object classification within a fully-trainable end-to-end deep learning pipeline. We exemplify the proposed framework on the task of bold words detection proving state-of-the-art results. Given an image containing text of unknown font-types (e.g. Arial, Calibri, Helvetica), unknown language, taken under various degrees of illumination, angle distortion and scale variation, we extract all the words and learn a context-dependent binary classification (i.e. bold versus non-bold) using an end-to-end transformer-based neural network ensemble. To prove the extensibility of our framework, we demonstrate competitive results against state-of-the-art for the game of rock-paper-scissors by training the model to determine the winner given a sequence with $2$ pictures depicting hand poses.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Multimodal Machine Learning Applications · Natural Language Processing Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Adam · Byte Pair Encoding · Residual Connection · Label Smoothing · Position-Wise Feed-Forward Layer · Absolute Position Encodings