Block-level Text Spotting with LLMs
Ganesh Bannur, Bharadwaj Amrutur

TL;DR
This paper introduces BTS-LLM, a novel approach that leverages large language models to identify, group, and order text blocks in images, enhancing context-aware text extraction and correction for downstream applications.
Contribution
The paper presents a new method combining line detection with LLM-based grouping and ordering for block-level text spotting, which is a novel application of LLMs in this domain.
Findings
Effective grouping of lines into text blocks.
LLM-based ordering improves text coherence.
Ability to rectify recognition errors using LLMs.
Abstract
Text spotting has seen tremendous progress in recent years yielding performant techniques which can extract text at the character, word or line level. However, extracting blocks of text from images (block-level text spotting) is relatively unexplored. Blocks contain more context than individual lines, words or characters and so block-level text spotting would enhance downstream applications, such as translation, which benefit from added context. We propose a novel method, BTS-LLM (Block-level Text Spotting with LLMs), to identify text at the block level. BTS-LLM has three parts: 1) detecting and recognizing text at the line level, 2) grouping lines into blocks and 3) finding the best order of lines within a block using a large language model (LLM). We aim to exploit the strong semantic knowledge in LLMs for accurate block-level text spotting. Consequently if the text spotted is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques · Speech Recognition and Synthesis
