AI based signage classification for linguistic landscape studies

Yuqin Jiang; Song Jiang; Jacob Algrim; Trevor Harms; Maxwell Koenen; Xinya Lan; Xingyu Li; Chun-Han Lin; Jia Liu; Jiayang Sun; Henry Zenger

arXiv:2510.22885·cs.LG·October 28, 2025

AI based signage classification for linguistic landscape studies

Yuqin Jiang, Song Jiang, Jacob Algrim, Trevor Harms, Maxwell Koenen, Xinya Lan, Xingyu Li, Chun-Han Lin, Jia Liu, Jiayang Sun, Henry Zenger

PDF

TL;DR

This paper investigates the application of AI-powered language detection and OCR to automate signage classification in urban linguistic landscape studies, demonstrating promising accuracy and highlighting the need for hybrid human-AI workflows.

Contribution

It introduces an AI-based method for signage classification in LL research, combining OCR and language detection, and evaluates its effectiveness on a real-world dataset.

Findings

01

AI model achieved 79% accuracy in signage classification

02

Identified common mislabeling issues such as distortion and graffiti

03

AI detects peripheral texts often ignored by humans

Abstract

Linguistic Landscape (LL) research traditionally relies on manual photography and annotation of public signages to examine distribution of languages in urban space. While such methods yield valuable findings, the process is time-consuming and difficult for large study areas. This study explores the use of AI powered language detection method to automate LL analysis. Using Honolulu Chinatown as a case study, we constructed a georeferenced photo dataset of 1,449 images collected by researchers and applied AI for optical character recognition (OCR) and language classification. We also conducted manual validations for accuracy checking. This model achieved an overall accuracy of 79%. Five recurring types of mislabeling were identified, including distortion, reflection, degraded surface, graffiti, and hallucination. The analysis also reveals that the AI model treats all regions of an image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.