ConText: Driving In-context Learning for Text Removal and Segmentation
Fei Zhang, Pei Zhang, Baosong Yang, Fei Huang, Yanfeng Wang, Ya Zhang

TL;DR
This paper introduces ConText, a novel visual in-context learning approach for text removal and segmentation that enhances reasoning through task chaining, context-aware aggregation, and self-prompting, achieving state-of-the-art results.
Contribution
The paper proposes a new task-chaining compositor and context-aware aggregation techniques to improve in-context learning for optical character recognition tasks.
Findings
Achieves state-of-the-art performance on multiple benchmarks.
Effectively handles visual heterogeneity with self-prompting.
Enhances reasoning with task chaining and context integration.
Abstract
This paper presents the first study on adapting the visual in-context learning (V-ICL) paradigm to optical character recognition tasks, specifically focusing on text removal and segmentation. Most existing V-ICL generalists employ a reasoning-as-reconstruction approach: they turn to using a straightforward image-label compositor as the prompt and query input, and then masking the query label to generate the desired output. This direct prompt confines the model to a challenging single-step reasoning process. To address this, we propose a task-chaining compositor in the form of image-removal-segmentation, providing an enhanced prompt that elicits reasoning with enriched intermediates. Additionally, we introduce context-aware aggregation, integrating the chained prompt pattern into the latent query representation, thereby strengthening the model's in-context reasoning. We also consider the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Multimodal Machine Learning Applications · Advanced Neural Network Applications
