Block-level Text Spotting with LLMs

Ganesh Bannur; Bharadwaj Amrutur

arXiv:2406.13208·cs.CV·June 21, 2024

Block-level Text Spotting with LLMs

Ganesh Bannur, Bharadwaj Amrutur

PDF

Open Access

TL;DR

This paper introduces BTS-LLM, a novel approach that leverages large language models to identify, group, and order text blocks in images, enhancing context-aware text extraction and correction for downstream applications.

Contribution

The paper presents a new method combining line detection with LLM-based grouping and ordering for block-level text spotting, which is a novel application of LLMs in this domain.

Findings

01

Effective grouping of lines into text blocks.

02

LLM-based ordering improves text coherence.

03

Ability to rectify recognition errors using LLMs.

Abstract

Text spotting has seen tremendous progress in recent years yielding performant techniques which can extract text at the character, word or line level. However, extracting blocks of text from images (block-level text spotting) is relatively unexplored. Blocks contain more context than individual lines, words or characters and so block-level text spotting would enhance downstream applications, such as translation, which benefit from added context. We propose a novel method, BTS-LLM (Block-level Text Spotting with LLMs), to identify text at the block level. BTS-LLM has three parts: 1) detecting and recognizing text at the line level, 2) grouping lines into blocks and 3) finding the best order of lines within a block using a large language model (LLM). We aim to exploit the strong semantic knowledge in LLMs for accurate block-level text spotting. Consequently if the text spotted is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques · Speech Recognition and Synthesis