# LICS: Locating Inter-Character Spaces for Multilingual Scene Text Detection

**Authors:** Po-Chyi Su, Meng-Chieh Lee, Yi-Ting Tung, Li-Zhu Chen, Chih-Hung Han, Tien-Ying Kuo

PMC · DOI: 10.3390/s26010197 · Sensors (Basel, Switzerland) · 2025-12-27

## TL;DR

This paper introduces LICS, a method for detecting text in multilingual scenes by focusing on spaces between characters, reducing the need for detailed annotations.

## Contribution

LICS uses inter-character spaces as language-agnostic cues and introduces a weakly supervised learning framework with a new annotated dataset.

## Key findings

- LICS achieves strong performance on ICDAR and Total-Text benchmarks, especially for Asian scripts.
- The CSVT dataset provides 20,000 annotated streetscape images with standardized labeling principles.
- Weakly supervised learning reduces annotation requirements while maintaining robust detection accuracy.

## Abstract

Scene text detection in multilingual environments poses significant challenges. Traditional detection methods often struggle with language-specific features and require extensive annotated training data for each language, making them less practical for multilingual contexts. The diversity of character shapes, sizes, and orientations in natural scenes, along with text deformation and partial occlusions, further complicates the task of detection. This paper introduces LICS (Locating Inter-Character Spaces), a method that detects inter-character gaps as language-agnostic structural cues, enabling more feasible multilingual text detection. A two-stage approach is employed: first, we train on synthetic data with precise character gap annotations, and then apply weakly supervised learning to real-world datasets with word-level labels. The weakly supervised learning framework eliminates the need for character-level annotations in target languages, substantially reducing the annotation burden while maintaining robust performance. Experimental results on the ICDAR and Total-Text benchmarks demonstrate the strong performance of LICS, particularly on Asian scripts. We also introduce CSVT (Character-Labeled Street View Text), a new scene-text dataset comprising approximately 20,000 carefully annotated streetscape images. A set of standardized labeling principles is established to ensure consistent annotation of text locations, content, and language types. CSVT is expected to facilitate more advanced research and development in multilingual scene-text analysis.

## Full-text entities

- **Diseases:** injury to (MESH:D014947), LICS (MESH:D008158)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12788283/full.md

## Figures

38 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12788283/full.md

## References

43 references — full list in the complete paper: https://tomesphere.com/paper/PMC12788283/full.md

---
Source: https://tomesphere.com/paper/PMC12788283