CONSENT: Context Sensitive Transformer for Bold Words Classification
Ionut-Catalin Sandu, Daniel Voinea, Alin-Ionut Popa

TL;DR
CONSENT introduces a context-sensitive transformer framework that effectively classifies bold words in images and can be extended to other tasks like determining game winners from image sequences, achieving state-of-the-art results.
Contribution
The paper proposes a novel transformer-based framework for context-dependent object classification, demonstrating its effectiveness on bold word detection and game outcome prediction.
Findings
Achieved state-of-the-art results in bold words detection.
Competitive performance in rock-paper-scissors game classification.
Framework is extensible to different visual classification tasks.
Abstract
We present CONSENT, a simple yet effective CONtext SENsitive Transformer framework for context-dependent object classification within a fully-trainable end-to-end deep learning pipeline. We exemplify the proposed framework on the task of bold words detection proving state-of-the-art results. Given an image containing text of unknown font-types (e.g. Arial, Calibri, Helvetica), unknown language, taken under various degrees of illumination, angle distortion and scale variation, we extract all the words and learn a context-dependent binary classification (i.e. bold versus non-bold) using an end-to-end transformer-based neural network ensemble. To prove the extensibility of our framework, we demonstrate competitive results against state-of-the-art for the game of rock-paper-scissors by training the model to determine the winner given a sequence with pictures depicting hand poses.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Multimodal Machine Learning Applications · Natural Language Processing Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Adam · Byte Pair Encoding · Residual Connection · Label Smoothing · Position-Wise Feed-Forward Layer · Absolute Position Encodings
